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ABSTRACT:  USING  LEVEu.-SENSITIVE  LATCHES  INSTEAD  OF  EDGE-TRIGGERED 

REGISTERS  FGR  STORAGE  ELEMENTS  IN  A  SYNCHRONOUS  SYSTEM  CAN  LEAD  TO 
FASTER  AND  LESS  EXPENSIVE  CIRCUIT  IMPLEMENTATIONS.  THIS  ADVANTAGE 
DERIVES  FROM  AN  INCREASED  FLEXIBILITY  IN  SCHEDULING  THE 
COMPUTATIONS  PERFORMED  BY  THE  CIRCUIT.  IN  EDGE-CLOCKED  CIRCUITS  THE 
AMOUNT  OF  TIME  AVAILABLE  FOR  THE  COMPUTATION  BETWEEN  TWO  REGISTERS 
IS  PRECISELY  THE  !_ENGTH  OF  THE  CLOCK  CYCLE,  WHILE  IN  CIRCUITS  USING 
LEVEL-SENSITIVE  LATCHES  A  COMPUTATION  CAN  BORROW  TIME  ACROSS 
LATCHES  THUS  REDUCING  THE  AMOUNT  OF  DEAD  TIME  IN  THE  CLOCK  CYCLE. 

IN  EITHER  TYPE  OF  CIRCUIT,  ACHIEVING  MAXIMUM  PERFORMANCE  REQUIRES 
LOCATING  THE  STORAGE  ELEMENTS  IN  SUCH  A  WAY  AS  TO  SPREAD  THE 
COMPUTATION  UNIFORMLY  ACROSS  A  NUMBER  OF  CLOCK  CYCLES.  RETIMING  IS 
THE  PROCESS  OF  REARRANGING  THE  STORAGE  ELEMENTS  IN  A  CIRCUIT  TO 
REDUCE  THE  CYCLE  TIME  OR  THE  NUMBER  OF  STORAGE  ELEMENTS  WITHOUT 
CHANGING  THE  INTERFACE  BEHAVIOR  OF  THE  CIRCUIT  AS  VIEWED  BY  AN 
OUTSIDE  HOST.  RETIMING  IN  EFFECT  RESCHEDULES  THE  CIRCUIT 
COMPUTATIONS  IS  TIME  BASED  ON  THE  LENGTH  OF  THOSE  COMPUTATIONS.  IN 
THIS  PAPER,  WE  EXTEND  THE  RETIMING  TECHNIQUES  DEVELOPED  FDR  EDGED- 
CLOCKED  CIRCUITS  BY  LEISERSON.  ROSE  AND  SAXE  TO  A  GENERAL  CLASS  OF 
MULTI-PHASE,  LEVEL-CLOCKED  CIRCUITS.  WE  FIRST  DESCRIBE  THIS  CLASS 
OF  WELL-FORMED  CIRCUITS  AND  DEFINE  WHAT  IT  MEANS  FDR  A  WELL-FORMED, 


LEVEL-CLOCKED  CIRCUIT  TO  OPERATE  CORRECTLY.  WE  THEN  SHOW  THAT  A  SET 
OF  CONSTRAINTS  CAN  THEN  CBE  USED  TO  RETIME  A  LEVEL-CLOCKED  CIRCUIT 
USING  EFFICIENT  INTEGER  LINEAR  PROGRAMMING  TECHNIQUES  SIMILAR  TO 
THOSE  USED  FOR  EDGE-CLOCKED  CIRCUITS. 
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Abstract 


Using  level-sensitive  latches  instead  of  edge-triggered  registers  for  storage  elements  in  ..synchronous 
system  can  lead  to  faster  and  less  expensive  circuit  implementations.  This  advantage  .lorlves  from 
an  increased  flexibility  in  scheduling  the  computations  performed  by  the  circuit.  In  edge-clocked 
circuits  the  amount  of  time  available  for  the  computation  between  two  registers  is  precisely  the 
length  of  the  clock  cycle,  while  in  circuits  using  level-sensitive  latches  a  computation  can  borrow 
time  across  latches  tlius  reducing  the  amount  of  dead  time  in  the  clock  cycle.  In  either  type  of 
circuit,  achieving  ma.\imum  performance  requires  locating  the  storage  elements  in  such  a  way  as  to 
spread  the  computation  uniformly  across  a  number  of  clock  cycles. 

Retiming  is  the  process  of  rearranging  the  storage  elements  in  a  circuit  to  reduce  the  cycle 
time  or  the  number  of  storage  elements  without  changing  the  interface  behavior  of  the  circuit  as 
viewed  by  an  outside  host.  Retiming  in  effect  reschedules  the  circuit  computations  in  time  based 
on  the  length  of  those  computations.  In  this  paper,  we  e.xtend-the  retiming  techniques  developed 
for  edge-clocked  circuit.s  liy  Loiserson.  Rose  and  Saxe  to  a  general  class  of  multi-phase,  level-clocked 
circuits.  We  first  describe  (his  class  of  well-formed  circuits  and  define  what  it  means  for  a  well- 
formed.  level-clocked  circuit  to  operate  correctly.  We  then  show  that  a  set  of  constraints  can  be 
efficiently  derived  for  a  circuit  which  preserve  its  correctness  under  retiming.  These  constraints  can 
then  be  used  to  rclinic  a  Icvcl-clocked  circuit  using  efficient  integer  linear  programming  techniques 
similar  to  those  used  for  edge-clocked  circuits. 

1  Introduction 

Synchronous  circuits  rely  on  clocked  storage  elements  to  hold  values  while  computation  is  performed 
on  them.  The  most  widely  used  storage  element  is  the  edge-triggered  register  which  samples  its 
input  at  the  beginning  of  each  clock  period,  holding  that  value  for  the  entire  clock  period.  Edge- 
triggered  registers  provide  a  straightforward  way  to  analyze  the  minimum  clock  period  of  a  circuit 
by  determining  the  maximum  delay  between  any  two  registers.  This  simplified  timing  analysis  leads 
to  efficient  retiming  techniques  for  adjusting  the  placement  of  registers  to  optimize  the  cycle  time 
or  the  number  of  registers  [(i.  7]. 

Level-clocked  circuits  are  synchronous  circuits  that  use  level-sensitive  latches.  These  latches  are 
clocked  storage  elements  that  allow  the  inputs  to  flow  through  the  latch  during  the  active  pha^e 
of  the  clock,  latching  the  value  during  the  inactive  phase.  In  level-clocked  circuits  it  is  less  clear 
how  much  time  is  available  to  the  computation  placed  between  latches  because  the  input  values 
may  arrive  early  and  flow  through  the  input  latch.  This  borrowing  of  time  between  clock  cycles 
makes  the  dercnuiiuiiion  of  the  constraints  on  the  clock  period  difficult,  If.nvpver.  the  flc-xibility 
in  .scheduling  the  coniputaLiou  provides  more  opportunity  to  optimize  the  (.ioli-.  peiiod  than  in  the 
case  of  edge-clocked  circuits. 

This  difference  in  scheduling  is  shown  by  the  example  circuits  in  Figures  1  and  2  where  the 
same  computation  is  implemented  using  an  edge-clocked  circuit  in  the  first  case  and  a  level-clocked 
in  the  second.  The  circuit  of  Figure  2  uses  a  two-equal-phase,  non-overlapping  clock  with  each 
edge-triggered  register  replaced  by  a  pair  of  ^i,  <j>2  latches.  The  edge-clocked  circuit  of  Figure  1 
shows  an  optimal  placement  of  registers  which  achieves  a  cycle  time  of  =  8.  By  contrast,  the 
level-clocked  circuit  of  Figure  2  shows  an  optimal  placement  of  latches  that  achieves  a  cycle  time 
of  T<f  =  6.  This  level-clocked  circuit  is  also  cheaper.  Assuming  that  the  cost  of  an  edge-triggered 
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Schedule  for  Circuit  Simple-A. 


Figure  1:  ,4  simple  circuit  optimally  timed  using  edge-triggered  registers  and  the  resulting  cluck 
schedule. 


Circuit  Simple-B. 


Schedule  for  Circuit  Si:nple-B  with  2,  equal-pliase  clock. 


Figure  2:  The  simple  circuit  now  timed  using  level-sensitive  latches. 


register  is  R  and  that  of  a  latch  is  y,  then  the  storage  element  cost  for  the  edge-clocked  circuit 
is  3/2  while  ti  .it  of  the  level -clockr'i  circuit  -  only  2/2.  T’.tC-  '  ;;ibiliiy  provided  by  Icvel-seupi 
lutclios  can  c  usi-i  to  r<''!:ice  c  >-r  w'ld-  .  ;celii'.!:  a  par;:  ■  clock  •  •  i.  For  i:!.slaurc  ' 


lutclios  can  c  usi-i  to  r<''!:ice  c  >-r  w'ld-  .  ;celii'.'.:  a  par;:  ■  clock  •  •  i.  For  i:!.slanrc  ' 

circuit  in  Figure  2  is  to  be  retimed  with  T.  T,  the  latches  »  tse  edgv  ween  nodes  v>  .n.o  i  ., 

may  be  moved  to  the  single  edge  vi  —  V2  reuucing  the  storage  clement  -f-  -o  R,  one-third  that  of 
the  slower  optimal  edge-clocked  circuit. 

In  this  paper  we  show  how  the  retiming  techniques  developed  by  Leiserson  et.  nl.  for  edge- 
clocked  circuits  can  be  e.\tended  to  optimize  level-clocked  circuits.  Figure  3  shows  the  edge-clocked 
correlator  e-xample  from  their  paper  in  its  original  state,  which  can  be  retimed  to  the  circuit  in 
Figure  4  which  operates  with  a  clock  period  of  13.  We  can  convert  the  circuit  of  Figure  3  into 
an  equivalent  level-clocked  circuit  by  using  a  two-equal-phase,  non-overlapping  clock  schedule  and 
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Figure  3:  The  correlator  circuit  from  Leiserson  et.  al.  in  its  initial  configuration  with  registers 
shown  as  solid  bars. 


Figure  4:  The  edge-clocked  correlator  circuit  optimally  retimed  to  a  clock  period  7*  =  13. 


Figure  5:  The  correlator  circuit  optimally  retimed  using  a  two-equal-phase  clock.  Latches  are 
represented  by  soli't  circles  and  marked  with  controlling  clock  phase.  The  resulting  clock  cycle 
time  is  10  units. 


replacing  each  register  with  a  pair  of  (px ,  ^2  latches.  This  circuit  can  be  retimed  to  the  one  in  Figure  5 
using  the  retiming  techniques  described  in  this  paper  to  achieve  an  optimal  clock  period  of  10.  The 
retiming  techniques  we  describe  also  handle  more  comple.K  clock  schedules  with  multiple  phases  and 
phase  overlap  and  underlap.  For  example,  retiming  the  correlator  example  to  a  two-equai-phase 
clock  with  10%  underlap  between  phases  achieves  a  clock  period  of  10.4.  These  techniques  can  also 
be  extended  to  clock  schedules  with  unequal  length  phases  through  a  technique  of  adding  tightly 
constrained  variables  to  the  system  which  contadn  information  regarding  the  current  phase  of  nodes 
in  the  circuit  graph. 

There  are  a  number  of  obstacles  to  level-clocked  retiming  each  of  which  is  explored  in  this  paper. 
These  include: 

•  Circuit  Correctness:  The  definition  of  a  correctly  operating  circuit  may  vary  widely  depending 
on  latch  phasing  and  clock  schedule.  The  variety  of  clocking  strategies  possible  causes  a 
general  retiming  technique  for  level-clocked  circuits  to  be  much  more  complex  than  required 
for  typical  cases.  We  restrict  the  techniques  in  this  paper  to  common  circuit  structures  and 
take  corresponding  advantage  of  those  structures  to  simplify  the  retiming  techniques. 

•  Minimum  vs  Maximum  Delay  Constraints:  fn  an  issue  related  to  circuit  correctness,  some 
circuit  structures  combined  with  particular  clock  schedules  impose  minimum  as  well  as  max¬ 
imum  delay  constraints  on  combinational  logic  paths.  In  this  work  we  restrict  legal  circuits 
and  clock  schedules  such  that  this  additional  comple.xity  does  not  arise. 

•  Identification  of  Critical  Cycles:  As  demonstrated  in  Figure  2.  time  allocated  to  a  combina¬ 
tional  logic  l)lock  may  l)o  shared  across  the  active  period  of  a  latch.  We  will  show  that  path 
based  constraints  which  allow  the  fle.xibility  to  share  acro's  latches  do  not  sufficiently  bound 
the  computational  time  available  around  a  cycle.  Instead  cycles  in  the  circuit  graphs  form  an 
independent  lower  bound  on  possible  clock  periods.  We  provide  a  technique  for  identifying 
this  lower  bound  initially  so  that  only  path-based  constraints  need  be  considered  above  the 
Critical  Cycle  period. 

•  Identification  of  Critical  Paths:  An  additional  impact  of  computational  time  sharing  is  that 
critical  paths  betw'en  two  nodes  in  a  circuit  graph  may  differ  from  those  identified  for  edge- 
clocked  retiming.  Moreover,  critical  paths  in  a  level-clocked  graph  are  not  the  same  for  all 
clock  periods.  We  provide  a  new  definition  of  criticrJ  paths  necessary  for  correct  retiming  of 
level-clocked  graphs  and  provide  a  technique  for  identifying  the  critical  paths  based  on  that 
definition. 


Constraints  on  Iliuhcr  Wiiaht  Paths:  Computational  ‘ime  sharing  rccpiires  constraints  on 
ot  n-j;,. '.oro  -.v.  i  in  t  circan  '^rnph  which  •  ;  -  •  liuiuiant  to  con.sirainiJi  on  zo: 

weight  paths  as  they  were  in  euge-cloct  i  c;rcuits.  TtUiniques  for  ccrrectly  generating  higher 
order  constraints  are  provided.  • 


•  Constraints  Dependent  on  Phase  of  Latch  Placement:  In  clock  schedule.s  utilizing  unequal 
phases,  the  maximum  delay  constraints  for  a  given  computation  path  may  differ  depending 
on  the  phase  of  latches  placed  along  the  path.  Techniques  for  writing  constraints  that  correctly 
restrict  ma.ximum  delay  dependent  on  latch  placement  are  provided  as  well  as  modifications 
of  e.\isting  algorithms  required  to  eflTiciently  solve  the  more  complex  constraint  sets. 
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1.1  Overview  of  the  Paper 

We  first  review  tlie  work  of  Leiserson  et.  al.  (6,  7]  on  which  our  work  is  based  and  present  the 
underlying  circuit  grapli  model.  Ne-xt  we  review  the  clock  model  we  have  adopted  from  the  work 
of  Sakallah,  Mudge  and  Olukotun  [5].  We  then  describe  the  class  of  well-formed  level-clocked 
circuits  to  which  we  will  be  limited  and  define  what  it  means  for  a  level-clocked  circuit  to  operate 
correctly.  In  Section  5  we  then  use  this  model  to  derive  the  set  of  constraints  that  fully  specify 
the  multi-phase,  maximum  delay  timing  restrictions  of  level-clocked  circuits.  Section  6  applies 
these  timing  restrictions  to  circuits  using  multi-phase  clocks  with  equal  phases  to  form  sets  of  ILP 
constraints  which  restrict  the  movement  of  latches  through  circuit  graphs.  Finally  Section  7  e.xtends 
our  techniques  to  handle  valid  clock  schedules  with  arbitrary  length  phases. 

2  Background 

In  this  section,  wo  l)iiolly  review  the  terminology  and  grapli  model  of  digital  circuits  described 
in  Leiserson  et.  al.  {(>)  and  extend  it  to  handle  level-sensitive  latches.  We  then  review  the  basic 
retiming  results  of  their  paper.  The  reader  is  encouraged  to  read  (6,  7)  for  full  details. 

A  circuit,  is  modeled  as  a  directed  multigraph  G  =  {V,E.w.(l,s)  whose  vertices  V  model  the 
functional  elements  of  the  circuit  and  whose  edges  E  model  the  interconnections  between  the 
functional  elements.  Fach  vertex  v  is  given  a  delay  dfu)  that  is  associated  with  the  corresponding 
functional  element.  .\  unique  host  vertex  v/,  with  d[vh)  =  0  is  used  to  represent  the  environment  of 
the  circuit.  Each  edge  is  given  a  weight  vj{e)  which  is  the  number  of  registers  along  the  connection. 
This  notion  of  edge  weight  is  sufficient  for  edge-clocked  circuits  which  use  a  single  register  type,  but 
must  be  extended  for  level-clocked  circuits  which  use  latches  controlled  by  different  clock  phases. 
We  do  this  by  a.ssociating  with  each  edge  e  the  sequence  s(e)  =  of  latches  along 

the  connection. 

The  notation  «  —  r  is  used  to  represent  an  edge  e  from  vertex  u  to  vertex  v.  A  path  in  the 
circuit  graph  is  a  sc(|uencc  of  vertices  and  edges  from  a  vertex  u  to  a  vertex  v  and  is  denoted  by 
u  —  V.  A  simple  path  contains  no  vertex  twice.  For  level-clocked  circuits,  we  also  refer  to  paths 
that  begin  at  a  latch  /  and  end  al  a  latch  m  for  which  we  use  the  notation  /  —  m. 

The  weight  w{p)  of  a  vertex  terminated  path  p  =  vq  —  vj  —  •  •  •  Vk  is  the  count  of  registers 
or  latches  along  the  path,  that  is.  the  sum  of  the  edge  weights  along  the  path:  xu(p)  = 

We  define  the  sequence  of  latches  along  the  path  with  k  edges  to  be  the  concatenation  of  the  edge 
latch  sequences  along  the  path:  ||^ro  sfe,-).  Thus  for  a  vertex  terminated  path  p,  w{p)  =  |5(p)|. 

For  a  latch  terminated  path  p  =  — ‘  t’o  —  wi  —  •  •  •  ujt  —  that  begins  at  a  latch  I  €  5(e_i) 
and  ends  at  a  latch  m  ^  sfer  ),  the  path  latch  sequence  s(p)  begins  with  the  tail  of -sfr-i )  (beginning 
with  1)  and  ends  with  the  lunui  of  stei,)  (ending  with  ml.  I'iilike  the  vertex  tcii.iinaicd  path.  ...e 
weight  w[p)  of  a  latch  icrtninaicd  path  I  ~  in  is  defined  to  be  |s(p)|  -  2:  that  is.  the  initial  ana 
final  latches  are  not  included  in  the  path  weight. 

The  weight  of  a  vertex  terminated  cycle  c  =  vq  —  uj  —  •  •  ■  '  uq  is  identical  to  the  weight  of 

the  Same  sequence  of  edges  and  vertices  treated  as  a  path.  However,  in  the  case  of  a  cycle  beginning 
and  ending  at  a  latch  /,  the  weight  of  a  path  w(l  /)  does  not  include  the  beginning  and  ending 
latch.  Thus  iu(c)  =  iv(p)  -1- 1  where  c  is  a  cycle  beginning  and  ending  at  latch  i  and  p  is  the  same 
cycle  treated  as  a  [lath  beginning  and  ending  at  latch  /. 
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The  delay  d(p)  of  a  path  is  the  sum  of  the  delays  of  the  vertices  along  the  path:  d{p)  = 
Ht=Q  The  delay,  d(c),  of  a  cycle  c  =  vq  ^  vi  ^  vq  includes  the  delay  of  node  vq  only 

once,  hence  d{c)  = 

2.1  Correct  Operation 

In  order  to  retire  a  circuit,  whether  edge-clocked  or  level-clocked,  a  dehnition  of  correct  operation 
must  exist.  This  allows  an  initial  circuit  graph  to  have  registers  moved  within  it  and  to  able  to 
determine  using  the  definition  whether  the  result  is  operating  correctly  with  respect  to  the  initial 
circuit.  For  edge-clocked  circuits,  a  simple  definition  of  correct  operation  is  used  for  retiming  which 
requires  that  the  following  conditions  be  maintained: 

•  Cl.  For  any  path  p  in  (?,  if  d{p)  >  clock  period,  then  to(p)  >  1. 

•  C2.  For  any  cycle  c  in  G,  w{c)  >  1. 

Retiming  a  circuit  is  the  process  of  transforming  a  circuit  graph  G  into  another  graph  Gr  by 
relocating  registers  {or  latches)  such  that  the  input/output  behaviors  of  G  and  Gr  are  identical. 
Transforming  a  circuit  G  into  a  corresponding  retimed  circuit  Gr  can  be  viewed  as  assigning  a 
retiming  (or  lag)  value  r[  v)  to  each  of  the  vertices  of  G.  This  retiming  value  represents  the  number 
of  registers  (latches)  removed  from  the  output  edges  of  verte-x  v  and  added  to  the  input  edges. 
More  formally,  for  any  edge  n  ~  v.  Wr(e)  =  w(e)  r(t;)  -  r(u). 

The  movement  of  registers  in  retiming  introduces  an  additional  aspect  of  correctness  which  is 
the  relative  difference  in  the  weight  of  two  paths  between  the  same  two  vertices.  For  e.\ample, 
assume  two  distinct  paths  tt  u  and  u-^v.  In  order  to  preserve  the  logical  structure  of  the 
circuit  the  difference  in  path  weight  «;(p)  —  w{q)  must  be  preserved  during  retiming.  Leiserson 
et.  al.  show  that  retiming  by  assigning  retiming  values  to  vertices  maintains  a  constant  difference 
between  the  weight  of  paths  with  the  same  endpoints  and  a  constant  number  of  registers  on  any 
cycle  in  the  graph.  Using  the  same  result,  correctness  condition  C2  is  also  maintained. 

The  key  retiming  result  of  (6,  7]  defines  the  following  set  of  constraints  which  must  be  met  by  a 
legal  retiming  of  an  edge-clocked  circuit  graph  using  a  clock  period  c.  These  constraints  are  given 
in  terms  of  the  maximum  delay  along  the  critical  paths  in  G.  A  critical  path  in  an  edge-clocked 
circuit  graph  is  defined  as  a  minimum-weight  path  of  maximum  delay  from  u  to  v.  In  reality  what 
is  being  identified  is  a  particular  path  such  that  if  that  path  is  retimed  correctly  then  all  other 
paths  between  the  same  two  end-points  will  also  be  retimed  correctly.  Note  that  some  sub-paths 
may  not  be  retimed  correctly  but  that  fact  will  be  detected  independently  of  the  overall  path.  The 
edge-clocked  definition  of  critical  path  is  used  to  define  the  matrices  IF  and  D: 

il'(K,  n)  =  miu{ii}[p)  I  (I  —  t’}. 

The  maximum  delay  on  any  critical  path  from  u  to  v  is  given  by 
D{u,v)  =  max{d{p)  \  u-^  v  and  tu(p)  =  ^^(u,!;)}. 

We  will  show  that  the  above  definitions  for  are  insufficient  to  identify  critical  paths  in  level- 
clocked  circuits  and  in  fact  the  critical  path  between  two  end  nodes  will  vary  with  the  clock  period 
of  interest.  However  the  above  definitions  are  sufficient  for  edge-clocked  circuits  and  using  them 
it  is  possible  to  generate  a  set  of  constraints  on  retiming  of  an  edge-clocked  circuit  such  that  the 


resulting  circuit  operates  correctly  under  our  definition.  The  constraints  on  edge-clocked  retinxing 
are: 

I/O:  livH)  =  0 

Positive  edge  weight:  r(u)  —  r(v)  <  w{e)  for  all  edges  u  —  v 
Maximum  path  delay:  r(u)  -  r(t;)  <  W{u,  u)  -  1  for  all  (li,  v)  for  which  D{u,  v)>  c 

The  I/O  constraint  maintains  the  I/O  behavior  under  retiming.  iVlthough  it  is  not  necessary  to 
require  that  the  host  verte.x  have  a  retiming  value  of  0,  having  the  retiming  value  identified  at  a 
particular  node  is  useful  in  some  solution  methods.  To  show  that  it  is  not  necessary  to  require 
r{vh)  =  0,  note  that  changing  all  node  retiming  values  by  any  constant  amount  results  in  the  same 
graph.  In  other  words,  since  the  weight  of  a  retimed  edge  is  r(u)  -  r(v),  if  all  values  of  r(u)  are 
changed  by  a  constant  amount  to  a  value  r(u)  the  resultant  edge  weight  values  must  be  identical: 
r(u)  —  r(v)  =  f(u)  -  /•{  v).  Thus  for  any  retiming  there  is  e.xists  an  identical  circuit  graph  such  that 
r(v/,)  =  0. 

The  positive  edge  weight  constraints  keep  the  retiming  from  assigning  negative  edge  weights 
which  have  no  physical  meaning.'  The  maximum  path  delay  constraints  force  proper  timing  by 
placing  at  least  a  .single  register  along  any  path  with  delay  greater  than  the  clock  period  of  inter¬ 
est.  Linear  programming  techniques  can  be  used  to  solve  this  constraint  set  and  return  a  valid 
assignment  of  the  retiming  variables  if  one  e.xists.  The  set  of  possible  optimum  clock  periods  is 
derived  from  the  delay  of  critical  paths  in  the  graph  and  a  binary  search  is  performed  over  that  set 
to  determine  the  fastest  possible  clock  period  to  which  the  circuit  may  be  retimed. 

The  host  vertex  i'/,  is  a  zero-delay  vertex  defined  as  the  source  of  all  circuit  inputs  and  the 
destination  of  all  circuit  outputs.  As  a  result,  additional  constraints  on  circuit  timing  are  imposed 
along  paths  which  pass  through  the  zero-delay  host  vertex.  These  cross-host  constraints  may  over¬ 
constrain  the  actual  design  by  implying  relationships  between  output  and  input  signals  which  are 
not  intended.  If  such  a  relationship  were  intended  it  should  be  represented  as  an  explicit  edge  in 
the  graph  rather  than  an  implicit  and  unavoidable  one. 

The  Correlator  circuit  example  used  in  this  paper  retains  constraints  through  the  host  vertex 
to  allow  comparison  with  [6,  7j.  The  simple  circuits  in  Figures  1  and  2  omit  cross-host  constraints 
and  can  be  thought  of  as  providing  implicit  registers  or  latches  in  the  host  vertex.  Or  it  may 
be  thought  of  as  dividing  the  single  host  vertex  into  two  parts  with  no  edge  between  them.  The 
I/O  constraint  can  be  expanded  to  prevent  retiming  of  any  host  vertex.  Various  additional  input 
and  output  timing  constraints  may  be  represented  by  placing  additional  delay  vertices  on  input  or 
output  edges  and  the  appropriate  constraint  on  their  retiming  value. 

Z  Clock  Model 

We  have  adopted  the  clock  model  of  Sakallah,  Mudge  &:  Olukotun  [5]  which  provides  a  convenient 
way  to  describe  the  constraints  on  multi-phase  clocks.  A  k-phase  clock  is  a  set  of  k  periodic  signals 
$  =  {^1 .  ..4>k}  where  (p;  is  referred  to  as  phase  i  of  the  clock  $.  All  have  a  common  cycle  time 
T^.  Each  phase  divides  the  clock  cycle  into  two  intervals  as  shown  in  Figure  6:  An  active  interval 
of  duration  and  a  passive  interval  of  duration  (T$  —  The  latches  controlled  by  a  clock 
phase  are  enabled  during  its  active  interval  and  disabled  during  its  passive  interval.  The  transitions 

*In  some  applications  negative  edge  weights  can  be  useful  as  an  intermediate  step  [8]. 
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into  and  out  of  the  active  interval  are  called  the  enabling  and  latching  edges  respectively.  We  refer 
to  the  clock  phase  controlling  latch  I  by  F(l). 


Figure  6:  Diagram  from  Sakallah  et.  al.  shotving  a  clock  phase  <p-  and  its  local  time  zone. 


Associated  with  each  phase  is  a  local  time  zone  such  that  its  passive  interval  starts  at  t  =  0,  its 
enabling  edge  occurs  at  t  =  T*  -  T^,-,  and  its  latching  edge  occurs  at  T^.  The  domain  of  the  local 
time  zone  is  defined  to  be  the  interval  (0,r*l  since  the  start  of  the  current  clock  cycle  ccinddes 
with  the  end  of  the  previous  cycle.  Sakallah  et.  al.  additionally  introduce  an  arbitrary  global  time 
reference  and  the  value  e;  which  denotes  the  time  relative  to  the  global  time  reference  at  which 
phase  ef>i  ends. 

Phases  are  ordered  relative  to  the  global  time  reference  so  that  ei  <  ca  <  •  •  •  <  ejt_i  <  e*.  The 
global  time  reference  is  arbitrarily  set  such  that  Cf.  =  T*.  The  phase  sequentially  following  <?{  in 
the  clock  set  is  referred  to  as  <n,+]i  with  phase  ojt+i  =  4>i  and  ©i_i  =  d>k. 

Finally  a  phase  shift  operator  is  defined: 

p.  .=  /  fort<i 

\  (T*  +  Cj  -  €,•),  for  i  >  j 

which  takes  on  positive  values  in  the  range  [O.T*].  When  subtracted  from  a  timing  variable  in  the 
current  local  time  zone  of  o„E,j  changes  the  frame  of  reference  to  the  next  local  time  zone  of  oj, 
taking  into  account  a  possible  cycle  boundary  crossing  (see  Figure  7).  Because  because  the  period 
of  each  clock  ph!ise  is  identical  and  c,-  >  cj-j,  the  sum  of  the  shifts  between  all  successive  phases  is 
T^: 

k 

(1) 

«=i 

We  assume  that  the  setup,  hold  and  props^ation  delay  times  of  latches  are  zero.  The  timing 
characteristics  of  a  given  latch  may  vary  as  it  is  moved  across  combinational  logic  nodes  and  thus 


Cj 


Figure  7:  The  phase  shift  operator  provides  the  relative  difference  between  limes  in  the  local  time 
zones  of  different  phases. 
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a  b  c 


Figure  8:  An  illustration  of  a  circuit  for  which  it  is  not  clear  what  the  maxintum  computational 
time  available  for  logic  block  CL  is. 

we  treat  latches  as  pass  gates  followed  by  a  non-inverting  buffer  of  zero  delay  and  infinite  drive 
capability.  Refining  this  simplified  latch  model  is  a  topic  for  further  research. 

When  enabled,  the  output  value  of  a  latch  is  defined  to  be  equal  to  its  input  value.  When 
disabled,  the  output  value  remains  that  of  the  input  at  the  time  of  the  most  recent  latching  edge. 
The  final  parameters  of  interest  are  the  values  for  the  arrival  and  departure  times  of  a  latch. 
The  arrival  time  of  a  signal  at  latch  I  is  denoted  by  At  and  the  departure  time  is  denoted  by  Di 
in  the  local  time  zone.  If  At  >  T<t,  ~  Tp^i)  then  the  latch  output  is  undefined  over  the  interval 
(7$  -  2p(/),  ;-l/).  Thetdeparture  time  is  given  by; 

Di  =  max{Ai,T,t,  —  7’p(/)}  (2) 

and  the  arrival  time  at  a  latch  m  of  a  signal  from  latch  I  connected  by  a  zero-weight  path  is: 

Am  =  Dt-  +  d{p)  (3) 

Note  that  this  clock  model  does  not  provide  for  clock  phases  with  differing  periods  nor  for  gated 
clock  signals. 


4  Well-Formed  Circuits  and  Valid  Clock  Schedules 


The  goal  of  the  retiming  process  is  one  of  determining  the  fastest  clock  at  which  latches  may  be 
placed  in  the  circuit  graph  such  that  the  circuit  performs  “correctly”.  Thus  a  definition  must  exist 
of  when  a  retimed  graph  operates  correctly.  General  definitions  of  correctness  for  circuits,  whether 
edge-clocked  or  level-clocked,  are  difficult  to  form  because  the  timing  constraints  which  are  critical 
in  the  initial  circuit  depend  on  the  designer’s  intentions  of  how  that  circuit  is  to  operate.  Given 
an  iiiit'.-'f  rir''!iii.  tlio  ciock  reriod  for  which  the  circuit  operates  ns  iiuoudc-d  by  i.lie  designer  nmv 
uiiiy  ::.e  (leterutiuod  through  U.e  use  of  a  rc.itricted  definition  of  correctness  to  wiiich  the  designer 
adhered.  For  instance  in  Figure  8  we  see  a  pair  of  latches  with  some  amount  of  combinational 
logic  in  between.  Without  some  external  knowledge  it  is  not  possible  to  state  whether  the  designer 
intended  that  the  maximum  delay  through  the  logic  block  is  limited  such  that  a  signal  departing 
from  /  arrive  at  m  before  latching  edge  a,  b  or  c. 

In  this  paper  a  definition  of  correctness  very  similar  to  that  for  edge-clocked  circuits  is  used. 
We  first  restrict  the  ordering  of  latch  phases  as  they  occur  in  the  circuit  graph  to  allow  a  sim- 
plifed  definition  of  correctness  and  for  retiming  constraints  to  be  written  which  take  advantage  of 
knowledge  about  the  graph  structure  and  are  least  restrictive  of  the  movement  of  latches  during 
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retiming.  The  resulting  “well-formed”  circuits  form  a  large  and  useful  class  of  circuits  including 
those  that  are  easily  produced  by  automatic  synthesis  tools.  These  restrictions  can  be  eased  by 
placing  appropriate  additional  constraints  on  the  retiming,  but  this  is  not  addressed  in  this  paper. 

A  well-formed  circuit  is  one  in  which  the  latches  occur  in  clock  phcise  order  along  any  path 
through  the  circuit.  More  precisely,  a  circuit  graph  G  is  well-formed  if; 

Wl.  For  every  path  between  latches  /  m  in  G,  if  w{p)  =  0  then  P(m)  =  P{1)  -f  1. 

W2.  For  every  cycle  c  in  G,  w{c)  >  1. 

The  first  constraint  simplifies  equations  defining  the  minimum  weight  along  a  circuit  path  by  forcing 
any  two  n- weighted  paths  which  end  at  the  same  point  in  the  graph  to  have  the  identical  ordered 
latch  sequence  so  that  any  two  paths  of  equal  delay  ending  at  the  same  vertex  require  the  same 
number  of  la.tches.  The  second  constraint  is  necessary  to  avoid  races  and  is  the  same  as  that 
required  for  edge-clocked  graphs.  Together  these  two  constraints  require  every  cycle  to  contain  a 
multiple  of  k  latches  for  a  ^’-phase  clock.  In  the  case  of  level-clock  circuits,  this  constraint  must 
be  combined  with  constraints  on  the  clock  schedule  to  ensure  that  all  cycles  contain  at  least  one 
disabled  latch  at  all  times.  This  will  be  provided  by  the  valid  clock  schedules  described  later  in 
this  .section. 

If  we  define  the  clock  phase  of  a  vertex  v,  denoted  P{v),  as  the  phase  of  the  latch  immediately 
preceding  v  on  any  path  leading  to  v  then  the  latch  immediately  following  vertex  v  on  any  path 
has  phase  P{v)  -1-  1. 

Retiming  a  level-clocked  graph  can  now  be  defined  similarly  to  retiming  a  edge-clocked  graph. 
The  definition  of  the  retiming  value  r(t;)  must  be  extended  to  include  its  effect  on  the  latch  sequence 
of  adjacent  edges.  That  is,  for  any  edge,  u  v,  the  relationship  Wr(e)  =  w{e)  -1-  r(v)  -  r(u)  still 
holds.  In  addition,  /  (n)  latches  (in  phase  order)  are  appended  to  s(e)  and  r(u)  initial  latches  are 
deleted  from  s{e)  to  form  Sr(e).  (The  case  where  r  is  negative  is  treated  symmetrically.) 

Well-formed  graphs  avoid  the  complexities  of  identifying  when  to  limit  vertex  retiming  values 
to  prevent  movement  of  latches  of  differing  phases  across  the  vertex.  The  following  lemma  assures 
us  that  this  will  not  happen  in  well-formed  graphs.  However,  because  the  retiming  value  of  the 
host  vertex  is  restricted  to  0,  we  can  relax  the  well-formed  definition  on  paths  crossing  Vh  to  allow 
circuit  inputs  and  outputs  to  occur  on  different  clock  phases  as  long  as  cross-host  constraints  are 
not  used  when  clocking  with  unequal  phase  clocks. 

Lemma  1:  A  well-formed  circuit  graph  remains  well-formed  under  a  valid  retiming. 

Proof:  Let  v  be  a.  vertex  in  the  original  graph  and  Vr  the  corresponding  vertex  in  the  retimed 
cranh.  Let  P(r)  =  eh;  and  thus  the  phase  of  the  latch  following  v  m  S{+i.  .4  rotirv’;g  value  of 
r{v)  =  j  lenioves  the  liiot  laicli  from  the  latch  sequence  of  each  output  edge  and  apj  nus  a  latch 
of  phase  i  -f  1  to  the  latch  sequence  of  each  input  edge.  The  case  for  r(t;)  =  -1  is  symmetric  and 
induction  provides  a  proof  for  any  value  of  r(v).  Thus  latch  ordering  (Wl)  is  maintained.  That 
W2  is  maintained  for  cycles  follows  from  the  retiming  results  for  edge-clocked  circuits  [6].  □ 

4.1  Correct  Operation  of  Level-Clocked  Circuits 

There  are  two  conditions  which  must  be  met  to  ensure  the  correct  operation  of  a  level-clocked 
circuit.  The  first  states  that  along  any  path  in  the  circuit,  the  signal  departing  a  latch  must  arrive 
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<Pi+l 

Figure  9:  Graphical  representation  of  the  constraints  on  the  clock  phases  that  are  required  for  correct 
operation  of  a  well-formed  level-clocked  circuit. 


at  the  next  latch  before  the  next  latching  edge  for  that  latch.  The  second  states  that  along  any 
path  in  the  circuit,  the  signal  departing  a  latch  must  not  arrive  at  the  next  latch  before  the  previous 
latching  edge  for  that  latch.  More  precisely, 

LI.  Maximum  delay:  For  any  zero-weight  path  I  m,  Am  =  -  Ep^^^p^m)  +  d{p)  <  r$. 

L2.  Non-interference:  For  any  zero- weight  path  Z  m.  =  Di  -  Ep^iyp^m)  +  d{p)  >  0. 

We  now  want  to  remove  any  a.ssumption  about  minimum  delay  in  a  correctly  operating  level-clocked 
circuit.  This  allows  us  to  avoid  two-sided  delay  constraints  and  allows  retiming  to  relocate  latches 
without  concern  for  retaining  vertices  between  latches. 

We  now  define  a  valid  clock  schedule  and  show  that  any  well-formed  circuit  operated  by  a 
valid  clock  schedule  satisfies  the  non-interference  constraint  L2.  That  is,  if  a  retiming  satisfies  the 
maximum  delay  constraint,  then  it  results  in  a  correctly  operating  circuit  even  with  zero  delays 
between  latches. 

A  clock  schedule  is  valid  if  it  meets  the  following  constraints: 

PI-  e,+i  >  e,',  which  follows  from  the  definition  of  a  clock  schedule  (constraint  a  in  Figure  9). 


P2. 


e,-  -h  r,t>  -  >  e;+i  for  i  ^  k 

e,-.  -  >  Co  for  i  =  k 


(constraint  b  in  the  figure). 


Note  that  these  constraints  allow  for  multiple  phase  clock  schedules  with  overlapping  and  under¬ 
lapping  phases.  However,  two-phase  clocks  are  required  to  be  non-overlapped. 

It  follows  from  constraint  P2  that  there  is  no  time  t  where  e,-  -  <  Z  <  e,-  for  i  =  1,  ...,k. 

That  is,  not  all  latches  can  be  active  simultaneously  and  thus  we  avoid  race  conditions  in  cycles. 


Theorem  2:  .Any  well-formed  level-clocked  circuit  operating  with  a  valid  clock  schedule  meets 


the  non-interference  constraint  L2. 


Proof:  Bv  Eimi.  2  on  naire  10.  D>  >  Tlr,  -  T-,,,  tha.t  is.  the  depari  uro  'ime  pnm  a  latch 
ip.u.ii  (!.  •  tir  at  or  aiior  tJie  eiiauling  edg-2.  By  constraint  B2,  E,.t.rx  ^  '  i  <  P  -  ^nu  so 

£V,t+i  <  El.  Since  P{m)  =  P(l)  -1- 1  for  any  path  /  m  with  w{p)  =  0  in  a  well-formed  graph, 
Ep{i),p(m)  <  El  and  thus  Ei  -  Ep^i)^p(m)  >  0.  Thus  constraint  L2  holds  for  any  d{p)  >  0.  □ 

Corollary  3:  The  phase  Pr{u)  of  a  node  u  in  a  well-formed,  retimed  graph  Gr  using  a  k-phase 
clock  and  given  P{u)  in  the  initial  graph  G  with  r(u)  the  retiming  value  of  u,  is: 

Pf{u)  =  [P(u)  -1-  r{u)]  mod  k. 
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Where  =  ^fc- 

Proof:  r{u)  =  11  is  defined  as  the  movement  of  n  latches  across  node  u.  By  the  definition 
of  a  well-formed  graph,  P(/i)  =  P{Iq)  +  1  for  any  /q,  connected  by  a  zero- weight  path.  Thus, 
P(/„)  =  [P(/o)  -fn]  mod  k  since  there  are  A;-phases  in  the  clock  schedule  and  (pk+i  =  ^1  as  defined. 
Under  retiming: 

Pr(«)  =  P{lr(u)) 

=  [P(/o)  +  '■(«)]  ^ 

=  [P(u)  +  r(u)]  mod  k.  □. 


5  Level-Clocked  Timing  Constraints 


This  section  derives  the  fundamental  Theorem  4  which  will  provide  the  basis  for  ILP  path  constraint 
sets  that  ensure  a  valid  retiming  of  a  graph  G  for  a  given  multi-phase  clock  schedule  $.  The  theorem 
provides  an  upper  bound  on  the  delay  of  an  n-weight  simple  path  in  a  level-clocked  graph  in  terms 
of  the  departure  time  of  a  signal  at  the  beginning  of  the  path  and  the  arrival  time  at  the  end.  The 
proof  is  based  on  the  maximum  delay  constraint  LI  of  the  previous  section  extended  to  paths  of 
non-zero  weight.  Figure  10  gives  a  graphical  representation  of  this  theorem. 

Theorem  4  provides  an  exact  bound  on  the  maximum  possible  delay  of  a  path  based  on  the 
departure  time  of  signals  from  the  latch  preceding  the  path  and  the  subsequent  arrival  time  of 
signals  at  the  latch  terminating  the  path.  For  retiming  purposes  we  are  interested  in  a  maximum 
bound  on  path  delay  which  is  presented  in  Corollary  0.  We  then  show  in  Corollary  6  that  cycles 
additionally  constrain  the  clock  period  and  show  how  an  analysis  of  critical  cycles  can  be  used  to 
derive  a  lower  bound  on  the  clock  period. 

Finally  we  demonstrate  that  the  edge-clocked  definition  of  a  critical  path  between  two  nodes  is 
insufficient  to  ensure  correct  retiming  of  the  nodes  at  all  clock  periods.  A  new  definition  for  critical 
paths  is  derived  and  a  method  of  identifying  a  critical  path  between  two  nodes  is  presented. 


Figure  10:  Graphical  representation  of  the  constraint  on  the  simple  path  delay  between  two  latches 
lo  and  In+i  in  d  correctly  operating  circuit. 


Theorem  4:  A  multi-phase,  level-clocked  circuit  graph  G  is  correctly  timed  using  a  valid  clock 
schedule  if  and  only  if  for  every  simple  path  Iq  In+i  with  weight  w{p)  =  n  and  latch  sequence 
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s(p)  =  [/o,  /i, . .  ./„+i },  the  path  delay  d{p)  is  bounded  by: 

w(p) 

d{p)  <  ^/„+,  -  Ao  +  X] 

1=0 

Proof:  (=j>)  By  induction  on  the  weight  of  a  path  w{p). 

Basis:  When  w{p)  =  0,  d(p)  =  Al^  -  Di^  +  -Ep(/o),p(;,)  by  Eqn.  3  on  page  10. 

Induction:  Divide  p  into  two  paths  Iq  -A.  /j  and  /j  /„+!•  From  Eqn.  3.  d(pi)  =  /!/,  -  Ao  + 

Ep{io),P(h)-  By  the  inductive  hypothesis,  d{p2)  <  Al^^^  -  Dt^  +  Eri?^ -^^PC/.O.Pih+i)-  By  Eqn.  2  on 
page  10  A/,  <  D/,  and  for  a  correctly  operating  circuit,  .4;,  <  T<j).  Thus  A;,  <  D;,  <  T<p  and  so 

d{p)  =  dipi)  +  d(p'2)  <  .4/„+,  -  Di^  +  E)=?^P(/.).P(/.+i)- 

(<^=)  We  show  that  if  d{p)  >  -  Ao  +  Efeo^  E?P(/i),P(/,+i)  then  the  constraint  on  valid  timing 

defined  by  Eqn.  3  must  be  violated  at  some  latch. 

Case  1:  If  w(p)  =  0:  Eqn.  3  is  violated  directly. 

Case  2:  If  w[p)  >  0:  We  assume  that  no  zero-weight  subpath  q  of  p  exists  such  that  Eqn.  3 
is  volated  and  show  by  contradiction  that  this  cannot  bo  true.  Since  d{p)  =  Er=o^(9i)  where 
/;  /,>!,  and  from  our  assumtion  d(qi)  <  Alf^,^  -  A,-  +  ^P(;;),P(/,+i)>  therefore; 

E%.-)  <  -  A.  +  ^P(h),P(/.+i))- 

1=0  1=0 

Substituting  for  d{p)  and  Yidiq): 

«i(p)  Ul(p) 

^'n+i  -  Ao  +  X!  ^P(';).P('.+i)  >  -  Ao  +  XI  ^P(h)-P«.+i)> 

1=0  1=0 

forming  a  contradiction.  □ 

The  following  two  corollaries  use  the  minimum  departure  and  maximum  arrival  times  of  signals 
from  latches  to  state  the  maximum  simple  path  delay  and  maximum  cycle  delay  in  terms  of  the 
clock  schedule  and  path  weight. 

Corollary  5:  .-1  multi-phase,  level-clocked  graph  G  using  a  valid  clock  schedule  is  correctly 

timed  if  and  only  if  the  delay  of  any  simple  path  Iq  /„+i  is  bounded  by: 

w(p) 

d(p)  <  Tw(/,)  -r  X  ^^P(',).P<1,^\)- 


Proof:  The  result  follows  directly  from  Theorem  4  by  observing  from  Eqn.  2  that  the  minimum 
departure  time  Do  =  and  from  constraint  LI  that  the  majcimum  arrival  time  Ai„^,i  =  T$. 

□ 


Corollary  6;  .4  multi-phase,  level-clocked  graph  G  using  a  valid  clock  schedule  is  correctly 

timed  if  and  only  if  the  delay  of  any  cycle  Iq  /n+i  is  bounded  by: 
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lll(c)  — 1 
:=0 

Proof:  The  result  follows  directly  from  Theorem  4  by  observing  that  Iq  =  In+i  and  thus  <  Di^. 
Additionally,  since  the  weight  of  a  cycle  includes  all  latches  placed  on  the  cycle,  the  weight  of  a 
path  /o  In+i  is  tu(c)  -  1.  □ 

Given  the  result  of  Corollary  6  we  can  form  a  tight  lower  bound  based  on  cycle  delays  on 
clock  periods  for  which  the  circuit  will  operate  correctly.  Unlike  in  edge-clocked  circuits,  this  lower 
bound  may  more  restrictive  in  some  cases  than  oath  based  constraints  for  the  same  circuit.  Hence 
the  critical  cycle  period  must  be  found  independently  of  path  constraints.  The  following  corollary 
derives  the  lower  bound  on  the  clock  period  of  a  circuit  baised  on  cycles  in  the  graph. 

Theorem  7:  For  any  correctly  operating,  well-formed  graph  G  using  a  k-phase  clock  schedule: 

V  cycles  c  €  0  :  T<\.  >  k  ■ 

Proof:  By  Corollary  6: 


U)(c)-1 

d(c)<  i?P(/,),P(/.+,)- 

1=0 

In  a  well  formed  graph  each  cycle  must  contain  latches  of  each  phase.  By  Eqn.  1  on  page  9, 
ELi  Hence: 


d(c)  < 
> 


u/(c\ 

t  n 
1=1 


=  wfc)  • 


h 

k ' 


.  □ 


In  our  search  for  an  optimal  retiming,  we  are  restricted  to  clock  valu.-s  greater  than  ^ 

critical  cycle,  denoted  Cr,  is  a  cycle  which  maximally  restricts  the  clock  period,  that  is,  a  cycle 
for  which  is  maximum.  The  value  of  for  a  critical  cycle  in  the  graph  may  be  found 

by  setting  the  values  a{u  o)  =  d{v)  and  £(e)  =  w{e),  and  solving  the  maximum-ratio-cycle 
problem  for  Polvnomial-time  algorithms  are  available  to  .solve  this  problem  from  Megiddo  [9] 
p  f2j.  I,;  ‘.ir  iciilar  ihe  alirorith  n  by  Burns  has  a  pro' ..lih* -nuing  time '^f  OflE/j  ■  !U|  •  fc) 
win.  the  ■  weight  of  a  cyc^  in  G.  The  resulting  .  >vides  a  f.j.  lower  bound  on 

the  i.-.id  time  i..  the  clicuit.  Although  tl.i.s  clock  cycle  may  not  be  realizable  due  to  restrictions  of 
the  more  general  path  constraints,  it  provides  a  useful  starting  point  in  searching  for  the  optimum 
cycle  time  of  the  circuit. 


5.1  Critical  Paths 

Now  that  a  lower  bound  on  possible  clock  periods  has  been  established  based  on  cycles  in  the 
graph,  a  search  process  must  be  performed  to  determine  the  minimum  clock  period  above  that 
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Figure  11:  Critical  paths  in  level-clocked  circuits  may  not  be  the  same  as  critical  paths  in  edge- 
clocked  circuits. 

bound  for  which  a  retiming  can  be  found  that  satisfies  path  constraints.  To  avoid  having  to 
determine  constraints  for  all  paths  in  the  circuit  which  is  possibly  exponential  in  number  of  edges, 
a  critical  path  between  any  two  nodes  u  and  v  is  found  such  that  the  minimum  weight  constraints 
for  all  paths  between  the  nodes  can  be  met  by  just  satisfying  the  constraints  of  the  critical  paths. 

We  redefine  a  critical  path  for  a  circuit  to  be  the  path  u  —  v  such  that  if  the  minimum  weight 
constraint  is  met  for  p,  then  it  is  met  for  all  paths  from  a  to  v  for  any  valid  retiming.  Critical 
paths  are  more  difficult  to  determine  in  level-clocked  circuits.  The  reason  is  that  the  path  limiting 
the  clock  period  may  not  be  a  zero-weight  path  as  guaranteed  for  edge-clocked  circuits.  This  is 
demonstrated  in  Figure  11.  Three  paths  exist  bet'.een  nodes  u  and  v.  labeled  from- top  to  bottom: 


path 

vertices 

w(p) 

cl(P) 

P 

U  —  V2  V-^-T  V 

4 

9 

q 

U-T  Vi  V 

2 

3 

r 

U  -*  V.i  ~  Vs  ~  V 

2 

5 

In  an  edge-clocked  circuit,  path  r  is  clearly  critical  since  w{r)  =  m.in{u}{p),w{q),w{r)}  and 
d{r)  =  max{c/(('/),d{;-)};  However,  if  we  consider  this  a  level-clocked  circuit,  with  2-equal-phases 
and  period  r<i>  =  2.  and  using  the  techniques  presented  in  Section  6,  the  minimum  required  weight 
of  path  p  is  7  while  that  of  path  r  is  4.  If  path  r  were  selected  as  the  critical  path  between  u  and  v, 
a  retiming  which  results  in  tUrfr)  =  4  would  be  considered  successful  even  though  (since  retiming 
maintains  a  constant  difference  in  path  weight  between  ;;  and  r)  the  resulting  Wr(p)  -  T.  .  Under 
the  new  definition,  the  critical  path  between  u  and  v  is  path  p  for  clock  period  =  2.  Note  that 
the  critical  path  under  the  new  definition  will  vary  with  differing  clock  periods.  For  instance,  the 
critical  path  in  Figure  11  at  =  10  is  r  inbS6i.a  of  p. 

We  must  now  identify  the  most  constraining  path  '  oni  w  to  v  for  a  given  dork  period.  The 
following  lemma  provido.s  the  basis  for  efficiently  determining  a  critical  paiii. 

Lemma  8:  A  path  u  —  u  in  a  well-formed  -urcuit  is  a  critical  path  if 

{ty(p)  Y  -  d{p)}  <  -  ^(9)}  Z®''  0^^  u-^v. 

Proof:  By  Contradiction.  The  delay  constraint  of  Corollary  5  may  be  restated  as: 

w{p) 

'^Pih)  +  S  -  ^(P)  ^  0, 

1=0 
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Assume  that  u  —  y  is  a  path  where  {w(p)^  —  d{p)]  <  {w{q)^  -  d{q)}  for  all  paths  u-h-v  but 
p  is  not  a  critical  path.  This  implies  that  there  exists  some  «  4+  u  and  a  retiming  such  that: 

'^P{lo)  +  ■^P('.),P(i.-+i)  “  ^(P)  -  ^P(/o)  +  -  d{q)  <  0.  That  is,  the 

minimum  weight  constraint  is  satisfied  for  p  t.j'.t  not  for  q  in  the  retimed  circuit.  However,  since 
retiming  maintains  a  constant  difference  in  pr,.f  vight  for  p  and  9,  and  the  graph  is  well  formed, 
tyr(p)  -  Wri^)  =  km  =  xu{p)  -  w{q)  where  —  '  .  '.umber  of  clock  phases  and  m  is  some  integer. 
Combining  the  weight  constraint  inequalitl'r  g/ 

Wr{p)  Wr(g) 

^P(io)  +  ^P{li),P(ii+i)  -  d(p)  >  Tp'.,,)  +  52  -Ep(q).P(/;+,)  -  diq), 


Wr{p) 

12  ^p(M.p(/.+i) -^(p)  > 

t=0 


^'P{U):P{iti.> , -  d{q), 

•...iO 

Wr{p}+km 

52  ^P(li),P[l,+i)  -  diq) 

i=Wr(p]H 


Using  the  result  from  Equ.  I  on  page  9,  =  mTi^.  Hence: 

-d{p)  >  mT<^  -  (/(q), 


-dip) 


krriY  “  diq), 


-dip)  >  iwiq)  -  wip))Y  ~  diq), 

Hp)^  ~  ^^^P)  >  ^ 

Which  contradicts  our  initial  assumption.  □ 

We  now  determine  the  values  in  the  matrices  Diu,v,Ttf)  and  Wiu,v.T<i)  as  dipc)  and  io(Pc) 
for  a  critical  path  u  —  0.  This  in  turn  requires  identifying  the  path  which  minimizes  the  quantity 
{wip)r^  -  dip)}  over  all  paths  u  v.  We  can  find  the  paths  which  minimize  this  value  by  running 
an  all-pairs  shortest  path  algorithm  on  G  using  new  edge  weights  w(e)  =  (ty(e)^  -<^^2))  for  each 

Vi  —  ^2- 

The  Floyd- Warshall  algorithm  may  be  used  to  solve  the  all-pairs  shortest  path  problem  since  it 
will  handle  the  possibly  negative  weight  values  of  w(e)  as  long  as  there  are  no  negative  weight  cycles 
in  the  graph.  This  requires  siiowing  that  there  is  no  cycle  c  for  which  t7;(c)  =  xoic)^  -  die)  <  0. 
.‘Vs  a  rosuil  of  Theorem  7.  wo  can  place  ?.  lower  bound  on  the  clock  pcr'od  used  to  retime  a  circuit, 
ritai  IS,  we  wiii  use  only  clock  periods  b.icn  that  i’.t  >  k  tor  .-lil .  .es  c  in  u-  Thus  :..ere  can 
be  no  negative  weight  cycles  for  clock  periods  of  interest  and  all  critical  paths  can  be  determined 
efficiently. 

Note  that  unlike  edge-clocked  circuits,  in  general  D  and  W  must  be  recomputed  for  every 
clock  period  attempted  by  the  retiming;  .However,  returning  to  Figure  'J  v,'o  can  plot  the  result 
of  {wip)^  -  dip)}  from  Lemma  8  for  each  path  against  the  clock  period  T.]..  The  resulting  plot 

^Tlic  sum  of  fill  (f(uj)  in  ui  equals  rffu  w)  —  <1(h)  tiithcr  than  dip).  However,  because  node  u  is  the  first  node  in 
■all  paths  u  v,  minimization  of  w  will  minimize  {wlp)^  -  d(p)}  as  well. 
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Figure  12:  Plot  of  slack  vs.  clock  period  for  paths  p,  q  and  r. 

is  shown  in  Figure  12.  The  slack  value  for  each  path  p  is  a  linear  function  in  clock  period  with 
slopes  and  y-aA’is  intercept  at  -d{p).  At  all  clock  periods  greater  than  the  intersection  point 
between  the  slack  functions  for  two  paths,  one  of  the  paths  will  have  less  slack  than  the  other  and 
the  other  path  will  have  less  slack  at  all  clock  periods  less  than  the  intersection-point.  For  any  two 
paths  of  the  same  weight,  the  path  with  greater  delay  will  always  have  less  slack  than  the  other. 
Due  to  these  properties,  if  a  par  ailar  path  is  critical  for  any  two  clock  periods,  it  will  also  be 
critical  for  all  clock  periods  in  between. 

Once  two  clock  periods  are  found,  one  above  the  optimal  clock  period  and  one  below 

the  optimal  clock  period  for  which  =  W{u,v.T<tgpf+)  for  all  values  u  and 

u,  then  for  any  clock  periods  <  T<j>  <  the  arrays  W  and  D  will  remain  constau.' 

and  need  not  be  recomputed.  Additionally,  because  the  slope  of  each  slack  function  is  >  0,  . 
W'XiA  =  min{iu(p)  i  u  u}  then  W{u,v,T^)  =  kF(w,  v,r4,^p,-)  for  all  T^>  ■ 

6  Retiming  for  Equal  Phase  Clocks 

The  theorems  in  thc  previous  section  the  basis  for  a  set  of  constraints  which  can  lie  used  to 
fieli'iiuiue  whether  .i  ieiiining  exists  for  a  particular  clock  suicdule.  in  liii.s  .sectiw.i  ue  investj_aio 
a  simple  clock  schedule  with  equal  length  phases.  In  Section  7  we  extend  the  capability  to  more 
complex  clocks  with  unequal  length  phases.  The  resulting  constraint  sets  can  be  solved  in  a  manner 
similar  to  the  original  Leiserson  et.  al.  methods  to  perform  retiming  of  level-clocked  circuits. 

An  equal-phase  clock  schedule  is  a  valid  k-phase  clock  set  $  =  {^i, ...,  <l>k}  where  all  active  phase 
periods  2!p,  are  equal,  and  all  phase  shifts  Since  the  length  of  the  active  period  is  the 

same  for  all  phases,  we  use  to  refer  to  the  length  of  the  active  period  of  any  phase.  Note  that  this 
definition  allows  overlapped  clock  phases  under  the  general  constraints  on  valid  clocks.  Becau.<!8  the 
active  periods  and  phase  shift  values  of  the  phases  of  each  adjacent  pair  of  latches  are  equal,  the 
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retiming  process  can  ignore  the  actual  phase  of  the  individual  latches.  In  the  identical  manner  to 
edge-clocked  circuits,  a  retiming  value  r(u)  is  assigned  to  each  vertex  of  the  circuit  graph.  However, 
a  value  of  r{v)  =  n  now  moves  n  latches  across  the  vertex  rather  than  n  registers. 

Proofs  in  [6,  7]  for  edge-clocked  circuit  graphs  showing  that  all  cycles  maintain  the  same  number 
of  latches  and  that  phase  differences  between  paths  with  common  endpoints  remain  constant  also 
hold  for  level-clocked  graphs.  Additionally,  because  r{vh.)  =  0,  no  new  latches  will  be  introduced 
from  or  transferred  to  the  outside  world.  It  is  not  possible  to  limit  path  constraints  for  level- 
clocked  circuits  to  the  length  of  zero-weight  paths  as  in  edge-clocked  circuits.  The  delay  of  any 
latch-bounded  path  is  affected  by  the  paths  preceding  and  following  it,  requiring  constraints  for 
higher  weight  paths  as  well. 

Corollaries  5  and  6  provide  a  basis  for  retiming  level-clocked  circuits  operating  under  an  k- 
equal-phase  clock  schedule.  These  constraints  take  two  forms:  a  minimum  possible  clock  period 
brised  on  simple  cycles  and  sets  of  timing  constraints  for  given  clock  periods  on  paths. 

The  method  used  to  form  the  required  constraint  set  is  to  first  guarantee  that  the  minimum 
cycle  period  constraint  imposed  by  Corollary  6  will  be  met.  Following  identification  of  the  minimum 
possible  clock  period  based  on  cycl-.s  we  combine  the  result  of  Corollary  5  with  knowledge  of  an 
equal-phase  clock  to  derive  L{u,  v),  t’.e  mir.Lnam  weight  for  a  critical  path  between  u  and  v.  Using 
this  result  a  pass  is  made  through  tho  VV  and  D  arrays  corresponding  to  the  critical  paths  in  G  to 
form  a  set  pf  path  constraints  requin.ng  L{u.v)  latches  rather  than  one  as  in  the  previous  work. 

We  now  restate  the  maximum  delay  constraint  as  a  minimum  weight  constraint  which  provides 
a  lower  bound  on  the  number  of  latches  on  a  simple  path  in  terms  of  the  path  delay. 


Corollary  9:  The  weight  of  any  simple  path  p  in  a  :orrectly  operating,  well-formed  circuit 
graph  G  using  a  k-equal-phase  clock  schedule  is  bounded  by; 


Hp)  > 


d{p)-n 


-  1 


Proof:  The  result  follows  directly  from  Corollary  5  on  page  14  using  the  fact  that  for  k-equal-phase 
clock  schedule,  Ei,i+i  =  ^  and  V,j,  =  Tp{io).  □ 


We  define  L(u,  v)  = 


DM-n 


H 


-  1  as  the  minimum  number  of  latches  required  on  a  critical 


path  from  u  to  v.  This  value  forms  the  basis  for  a  set  of  constraints  for  retiming  well-formed  circuit 
graphs  controlled  by  a  A:-equal-phase  clock  schedules  for  a  given  clock  period  T$: 


I/O- 

Positive  edeo  weitr'"-  : 
Maximum  |)Hth  deia> : 


=  0 

,  )  -  !•(;•)  <  (!•(/:')  for  ail  ''dees  "  —  v 
r[  iij  -  r{u)  <  H'’(n,  wj  -  j.i  u.  v]  i  yt  ail  (u..  c) .  -.c:.  that  JA  u.v)  ^  I 


6-1  Correlator  Example  Revisited 

The  correlator  shown  in  Figure  3  on  page  4  can  be  transformed  into  a  well-formed,  two-phase, 
level-clocked  circuit  by  replacing  each  register  in  the  edge-clocked  circuit  with  a  pair  of  <^1,^2 
latches,  thereby  doubling  the  weight  of  each  edge.  Retiming  this  example  using  a  iwo-equal-phase, 
non-overlapped  clock  'ichedule  leads  to  the  circuit  graph  of  Figure  0  on  page  4.  The  W  and  D 
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matrices  are  the  same  as  in  the  original  example  except  that  all  values  in  W  are  multiplied  by  two 
to  reflect  the  conversion  to  latches. 

Finding  Tch  results  in  a  value  of  10  units.  Several  cycles  are  critical  in  this  particular  graph: 


Vertices  in  cycle 

d(cycle) 

w(  cycle) 

d/w 

10 

2 

5 

Vh,Vl,V2,Ve,V7,Vh 

20 

4 

5 

Vh,Vi,V2,V3,Vs,V6,V7,Vh 

30 

6 

5 

Retiming  to  an  ideal  two-equal- phcise  clock  schedule  with  2$  =  10  {d/w  =  5  for  any  critical 
cycle,  which  requires  r<i,  =  10  for  a  2  phcise  clock)  requires  that  the  following  path  constraints  be 
met: 


The  above  set  of  constraints  when  combined  with-the  necessary  edge  constraints  may  be  solved 
successfully  to  define  a  set  of  retiming  values  resulting  in  the  latch  placement  in  Figure  5. 

6.2  Determining  Potential  Optimum  Clock  Periods 

It  is  not  always  possible  to  retime  level-clocked  circuits  to  the  lower  bound  clock  period  given  by 
a. critical  cycle  as  in  the  previous  example.  For  a  theoretically  precise  optimum  value,  it  becomes 
necessary  to  determine  a  set  of  possible  optimum  clock  periods  over  which  to  search  for  a  minimum 
legal  clock  period.  In  an  optimal  edge-clocked  circuit  there  exists  some  critical  path  of  zero  weight 
with  delay  exactly  the  value  of  the  clock  period  and  thus  it  is  a  simple  matter  to  make  a  list  of  all 
path  delays  from  the  D  array  and  perform  a  binary  search  on  that  list  to  determine  the  optimal  T4.. 
In  level-clocked  circuits  the  critical  path  may  be  many  clock  phases  in  length  and  of  weight  greater 
than  zero.  Additionally  the  critical  path  between  two  vertices  may  change  given  two  differing  clock 
periods.  Enumeration  of  all  possible  paths  between  two  vertices  can  be  exponential  in  the  number 
of  edges  and  so  is  not  practical  as  a  starting  point  for  determining  possible  critical  path  weights. 

The  following  theorem  uses  the  fact  that  for  an  optimal  retiming  of  a  level-clocked  circuit  graph 
(not  retimed  to  the  critical  cycle  period)  there  will  exist  some  critical  path  which  exactly  meets  the 
minimum  weight  requirements.  If  this  were  not  true  there  would  exist  a  faster  clock  period  for  the 
same  weighting.  As  we  approach  the  optimal  clock  period  for  the  graph  we  identify  a  range  in  which 
the  opriinal  clock  i>oriod  e.xisis  and  over  which  the  critical  oaths  in  the  circuit  graph  ij  not  chance: 
liowcu  i.  in  certain  cases.  Miicn  tiie  intersection  of  slack  \alucs  for  two  paths  ocr..!.j  ai  preci.-.ely 
the  optimal  clock  period,  it  is  not  possible  to  determine  a  range  over  which  the  critical  path  arrays 
will  match.  In  this  case  we  our  algorithm  must  exit  at  some  desired  level  of  accuracy.  Assuming 
a  range  over  the  optimal  period  is  found  with  matching  W  arrays,  we  change  the  inequality  from 
the  minimum  path  weight  result  in  Corollary  9  to  an  equality  and  form  a  set  of  possible  optimal 
clock  periods. 

Theorem  10:  The  optimal  cycle  lime  of  a  well-formed  circuit  graph  G  clocked  by  a 
k-equal-phase  clock  is  in  the  set  C: 


Path 

u— 

#  latches 
req’d 

D{P) 

1-— -l,! — 1>,2 — 5 
5— *6.6— ".7— -’2 

1 

1—5.2— —7,3— 6 

4 — 6.6 — 1.7 — 3 
7—6 

’2 

>  (1.5.  T4.) 

Path 

u— M-V 

#  latcltes 
req’d 

D{P) 

5— 7,6— 3 
7—5 

3 

3— 1.4— 1.5 — 2 
6— 4.6— 5 

4 

>  (2.5 -r*) 

4— 3.5— 4 

5 

>  (3-r+) 
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Proof:  Follows  from  setting  the  left  and  right  hand  sides  of  Corollary  9  equal  and  solving  for 
T^.  Because  the  value  is  proportional  to  r<t,  substitute  in  duty  cycle  (Td  -  Tip)  instead  which 
remains  constant  for  the  clock  schedule.  □ 

Real  circuit  graphs  have  built-in  error  due  to  estimation  of  combinational  logic  delays,  and  thus 
the  value  of  generating  a  precise  set  of  possible  optimum  clock  periods  is  questionable.  Large,  com¬ 
plex  circuits  with  many  combinations  of  possible  path  delays  and  weights  have  a  densely  populated 
set  of  possible  optimum  clock  periods.  A  more  reasonable  approach  is  simply  to  perform  a  search 
over  real  values  to  the  desired  level  of  accuracy  given  some  knowledge  of  the  accuracy  of  the  circuit 
modeling  process.  However,  the  technique  of  finding  an  exact  optimum  is  presented  here  to  show 
it  is  significantly  differeut  from  the  original  work. 

VVe  can  now  define  a  new  algorithm  for  finding  the  optimal  retiming  of  a  /,-equal-phase,  level- 
clocked  circuit  graph: 

Algorithm:  Optimal  /-Equal-Phase  Retiming: 

1.  Determine  the  critical  cycle  period  Tcn  =  k 

1.  Attempt  to  retime  to  if  successful  =  Tc^. 

2.  Repeatedly  multiply  the  value  of  by  a  until  a  legal  retiming  is  found.  Set  to  the 

r+  +  ^ 

clock  period  of  ihc  (irst  legal  retiming.  Set  TPopi-  lo  — ~ — . 

3.  If  for  all  u  and  v,  u,T, proceed  to  step  4,  otherwise  perform  a 

binary  search  over  until  matching  W  arrays  are  found  or  a  desired  level  of 

accuracy  is  met. 

4.  Perform  a  binary  search  over  the  set  of  possible  optimum  cycle  times  C  computed  from  W 
and  V  to  find  the  minimum  value  for  which  a  legal  retiming  can  be  found. 

As  pointed  out  above,  finding  an  exact  optimal  solution  in  practice  is  probably  not  worthwhile. 
Instead  eliminate  Step  4  above  and  replace  Step  3  with: 

■3.  Per.Corm  a  binarv  search  ovor  until  the  dp.sired  level  of  accuracy  is  r''ari-,o{l. 

Since  in  practice  A„j,{  will  likely  be  near  the  value  of  Tc,^t  biasing  the  search  !)aLcern  sucii  that 
the  lower  end  is  favored  is  often  worthwhile.  Our  system  uses  a  =  1.25.  Additionally,  because 
high  values  of  n  in  Theorem  10  result  in  the  smallest  values  in  set  C,  the  set  may  be  generated 
incrementally  as  needed  rather  than  ail  at  once. 

Example  2:  A  2-equal-phase  example  with  phase  underlap 

Real  circuits  cannot  be.  designed  with  an  ideal  clock  schedule  as  was  used  in  the  previous 
example.  Instead  a  typical  clock  schedule  might  have  each  active  period  2'^  ~  O-'PiV  giving  an 
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Figure  13:  The  correlator  circuit  optimally  retimed  using  a  2-phase,  equal-period  clock  where  T4,  = 
0.4r<t;  =  10.345  units. 


underlap  between  pha.ses  of  O-l'i'.!,.  In  this  example  we  retime  the  correlator  circuit  graph  using 
such  a  clock  schedule. 

As  in  the  previous  example  Tcn  =  10;  however,  in  this  case  a  retiming  to  that  clock  period 
cannot  be  found.  A  legal  retiming  is  found  to  =  20  and  the  W  and  D  matrices  match  at  20  and 
10.  The  set  of  possible  time  periods  C  is: 

C  =  {10.0,  10.345.  10.526,  10.833, 11.053,  11.111,  11.25...} 

A  binary  .search  ovc  this  list  finds  the  fastest  time  possible  =  I0..345.  The  circuit  retimed  to 
this  clock  schedule  ii*  .shown  in  Figure  13. 


6.3  Reducing  the  Required  Number  of  Constraints 

VVe  do  not  consider  the  larger  number  of  constraints  required  for  the  level-clocked  retiming  to  be 
much  of  a  problem  since  the  overall  number  of  constraints  is  still  limited  to  |K|^.  However,  it  is 
true  that  the  expected  number  of  constraints  is  much  greater  than  for  edge-clocked  retiming  since 
long  paths  usually  have  a  different  constraint  imposed  on  them  than  on  their  sub-paths  whereas  in 
edge-clocked  graphs  constraints  on  long  paths  were  usually  redundant  with  a  shorter  sub-path.  The 
exact  relationship  between  the  number  of  constraints  for  the  two  retiming  methods  is  dependent 
on  the  graph  structure  and  on  the  delay  of  vertices  in  the  graph  relative  to  the  length  of  the 
clock  period  of  interest.  The  correlator  example  is  again  useful  here  because  it  may  easily  be 
extended  lengthwise  and  the  number  of  constrmnts  for  different-sized  graphs  compared.  Table  1 
displays  in  relationship  lo  number  of  nodes  in  the  correlator  graph  .and  the  number  of  path 
..un.'>i..iinu  rfii’iiro'i  fui  t.iu.  lecnjuque  to  retime  to  the  corresponding  optimal  clock  period. 

It  is  possible  to  limit  the  number  of  constraints  required  for  retiming  by  limiting  tlie  number  of 
latches  through  which  borrowing  is  allowed.  If  borrowing  is  allowed  only  through  N  latches,  path 
constraints  are  defined  as: 

path  constraints:  r{«)  -  r{v)  <  W{u,v)  -  L{u,  v)  for  any  0  <  L{u,  v)  <  N 
limited  path  constraints:  r(u)  -  r(v)  <  W(u,v)  -  L{u,v)  —  1  for  any  N  <  L(u,v) 

Since  long  paths  are  now  over-constrained  and  a  greater  portion  of  the  path  constraints  will  be 
redundant  to  shorter  sub-paths,  limiting  borrowing  in  this  manner  reduces  the  number  of  constraints 
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Correlator 

size  in  nodes 

Edge-Trig 

#  path 
constraints 

^♦opt 

Ideal  2-equal-phass 

#  path 
constraints 

8 

5 

10 

23 

10 

mSM 

8 

10 

41 

12 

14 

16 

10.286 

65 

14 

14 

20 

10.286 

95 

16 

14 

24 

10.5 

129 

20 

14 

32 

10.5 

219 

30 

14 

52 

10.5 

528 

50 

14 

92 

10.5 

1546 

100 

14 

192 

10.5 

6354 

Table  1:  Comparison  of  optimal  clock  periods  found  for  varying  sizes  of  correlator  circuits. 


N 

#  path 
constraints 

Ideal  2-cqual-phase 

1 

N 

#  path 
constraints 

1 

■DSH 

14.00 

■ 

1225 

11.17 

O 

12.74 

1 

1309 

11.02 

3 

■174 

11.56 

1 

10 

1437 

10.78 

4 

658 

11.72 

1 

12 

1757 

10.78 

5 

838 

11.72 

■ 

15 

10.78 

6 

.880 

11.02 

1 

16 

2169 

10.70 

7 

1056 

10.94 

1 

17 

2240 

10.50 

Table  2:  Optimal  clock  periods  found  while  using  restricted  constraint  sets  that  allow  borrowing  over 
N  latches  in  the  100  node  correlator  example. 


required  to  retime  the  graph  at  the  expense  of  finding  a  less  than  optimal  solution.  The  experimental 
results  shown  in  Table  2  demonstrate  that  the  100  node  correlator  example  can  be  retimed  to  the 
optimal  clock  period  with  many  fewer  constraints  than  those  used  for  the  most  general  case. 

The  time  values  in  Table  2  were  derived  using  smaller,  limited  constraint  sets.  In  most  cases 
the  resulting  graph  can  actually  be  operated  at  a  higher  speed  than  that  shown,  however  the  over¬ 
constrained  retiming  process  cannot  show  that  to  be  true  without  a  less  restrictive  constraint  set. 
The  difficulty  with  this  heuristic  technique  is  also  demonstrated:  The  optimal  time  period  found 
does  not  decrease  monotonically  with  increasing  number  of  levels.  This  is  due  to  an  interaction  of 
the  graph  granularity  with  the  level  at  which  paths  are  over-constrained. 


7  Retiming  of  Unequal  Phase  Circuits. 

Unlike  ennal-pha.se  retimine  »•  minimum  weight  of  a  path  -.iider  an  unequal  pha.se  clock  .srhcduln 

iepeinis  on  the  pile.  '  coniro . .  .nui.ach  at  which  the  paiii .  .  ’.i.  riiis'iiii'ercr.cc!  - i.ytnos. 

in  Figure  14  which  snows  that  the  maximum  length  of  0  ai.t,  s.  weight  paths  beginning  at  a  ; 
controlled  by  each  phase  of  a  2-phase  clock.  Neither  the  edge-clocked  retiming  methods  from  (6.  7) 
nor  the  equal-phase  retiming  developed  in  Section  6  can  differentiate  which  phase  of  latch  is  being 
moved  across  a  particular  vertex  in  the  graph;  However,  because  latch  phases  alternate  along 
paths  in  well-formed  graphs,  the  knowledge  of  what  phase  latch  precedes  a  gi/en  vertex  is  directly 
available  from  the  retiming  values. 

First,  the  minimum  weight  value  for  a  critical  path  is  extended  to  a  set  of  values  Zii(w, «),  i  6 
{0...k—i},  indexed  relative  to  the  initial  arrangement  of  latches  in  the  circuit  graph.  Similarly, 
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iy(p)  =  1  h 

w(p)  =  Ot- 
4>i 


10  U  |lO  Il5  |20  I2&  I30 


w(p)  =  Ol - 1 

w(p)  =  1 1 - 1 

Figure  14:  A  two-phase  underlapped  clock  schedule.  Distances  are  the  maximum  time  period  for  a 
path  of  given  weight  beginning  at  a  vertex  preceded  by  the  adjacent  clock  phase. 


in  order  to  form  constraints  based  on  the  phcise  of  a  latch  at  which  a  path  begins,  the  single 
retiming  value  t{u)  is  divided  into  a  set  of  values  r,-(u),  i  G  {1 .  ..k}.  In  a  general  ILP  constraint 
set.  new  constraints  are  added  to  maintain  the  sequential  movement  of  latches  using  these  “phased 
retiming"  values.  Given  these  new  capabilities,  “phase  specific"  constraints  are  derived  which 
require  the  correct  number  of  latches  be  placed  along  any  path  given  any  legal  combination  of 
phased  retiming  values.  .Although  a  complete  set  of  ILP  constraints  may  be  formed  and  solved  in 
this  manner  it  is  also  possible  to  find  a  solution  using  a  modification  of  the  Bellman-Ford  technique 
and  a  constraint  set  of  approximately  the  same  size  as  required  for  equal-phase  retiming  and  using 
the  same  (1K|)  number  of  variables. 


7.1  Minimum  Weight  Requirements 

The  result  of  Corollary  5  on  14  may  be  re-written  to  provide  a  general  equation  for  the  minimum 
weight  of  a  path.  The  equation  must  be  solved  on  a  case  by  case  basis. 


Corollary  11:  The  weight  w(2))  of  any  path  u  v  in  a  correctly  timed,  well-formed  circuit 
gra])h  G  using  a  valid  k-phase  clock  schedule  is  bounded  by; 


»(p)  >  *  + 


-1  if  {d{p)  -  T/5(u))  mod  r<i>  =  0 
0  i/O  <  {d{p)  -  Tp(u))  ^od  <  Ep[u),p{u)+i 

f  >  -£^P(u),P(<x)+l 

if{d{p)-Tp(^,))  modT^  { 


1 


<  ■Sp(„),P(u)+l 

+-£^P(u)+1,P(u)+2 


[  -  i  if  {dip)  -  Tn(,,)/  mod  T.{.  > 


Proof  Sketch:  From  Corollary  5  we  have: 

w{p) 

■^P(u)+i,P(u)+i+l 

y=o 


Hp) 

k 


wIp)  mod  k 

■^Piu)+j,P(u)+j+l 

J=o 


>  d(p)  -  Tp(u) 

>  d{p)  -  rp(„) 
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Applying  (mod  T$)  results  in: 
u;(p)  mod  fc 

0  +  [  ^  jEJp(u)+j,p^u)+j+i]  mod  >  [d(p)  -  rp(u)]  mod  T$.  (4) 

i=o 

Case  0:  If  [d(p)  -  Tp(„)]  mod  T<t,  <  jEp(u),P{u)+i»  then  iu(p)  mod  k  =  0  and  by  Eqn.  1  on  page  9, 


t(;(p)  =  k 


dip)  -  Tp(^) 


The  remaining  cases  follow  similarly.  □ 

We  define  X,(u.  v)  as  the  minimum  number  of  latches  required  on  a  critical  path  from  u  to  v 
when  Priu)  =  P(u)  +  ;,  i  6  {0...(A:  —  1)}.  Pr(w)  =  the  phase  of  node  u  in  the  retimed  graph 
(?r.  Values  for  X, •(«,«)  are  computed  by  substitution  of  u)  in  for  d(p)  in  Corollary  11.  For 
notational  convenience  we  will  sometimes  refer  to  Xjt  which  is  equivalent  to  Xq  in  all  cases. 

Now  that  a  set  of  minimum  weight  values  has  been  determined,  it  is  necessary  to  form  ILP 
constraint  sets  which  require  the  correct  number  of  latches  on  a  path  given  the  phase  of  the  first 
node  in  the  path.  For  e.xample  in  a  2-phase  system,  for  any  pair  of  nodes  u  and  v  the  following  two 
constraints  are  required: 


r(tt)  -  (■(u)  <  iy(u,  y)  -  Lq{u,v)  for  P(u)  =  /P(u) 

r{u)  -  r(y)  <  W(u,  v)  -  Xi(«,u)  for  P{u)  =  IP{u)  -f  1 

These  two  constraints  may  not  be  implemented  simultaneously  because  of  the  conditional  expression 
on  when  each  is  valid.  If  both  were  imposed  the  minimum  value  of  X,(u,  u)  would  be  the  value 
required  at  all  times  on  critical  paths  from  u  to  v.  Instead  we  formulate  a  new  set  of  variables 
which  contain  knowledge  of  the  current  phase  of  node  u  and  form  constraints  using  those  variables 

such  that  the  correct  value  of  X,(ii,  y)  is  imposed.  The  new  variables  for  each  node  are  known  as 

“Phased  Retiming”  variables. 


7.2  Phased  Retiming  Values 

We  now  split  each  retiming  value  r(u)  into  a  set  (ri(u),  r2(u), ....  rfc(u))  according  to  the  following 
definition: 

f  +  1  for  t  <  r{u)  mod  k] 

»(  )  I  [^J  for  i  >  r(u)  mod  k; 

rhvsicni.  ■-  -  ■  ’orosonls  the  nunioer  r  late:.  moved  ;ic:osh  vertG:\  u  '  v  a  retiming. 

For  notational  convenience  we  will  sometimes  refer  to  tq  which  is  equivalent  to  Vf.  in  all  cases.  In  a 
sense  we  are  exposing  information  about  the  phase  of  a  node  under  any  retiming  given  knowledge 
of  the  well-formed  graph  structure.  The  following  lemma  makes  use  of  this  information  to  form 
path-weight  constraints  which  are  specific  to  the  current  phase  of  the  node  beginning  the  path. 

Lemma  12:  A  set  of  values  (ri(u)^r2(«), ...,ri:(u))  is  a  set  of  phased  retiming  values  as 
defined  above  iff  the  following  constraints  are  met: 
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Phased  Variables:  =Xf=i  ^i(w) 

Latch  ordering:  |  I J  |  M  «“ 


i  e 

j  e  {I...i-1} 


Proof:  (If:)  Restating  the  two  latch  ordering  constraints: 

for  all  rj{u)  where  j  <  i,  rj(u)  <  r,-(u)  +  1.  (5) 

for  all  rj(u)  where  j  <  !,  rj(u)  >  r,-(u).  (6) 

If  any  value  r,-(«)  =  +  1,  as  allowed  by  constraint  5,  then  all  rk(u)  +  1  <  rj(u)  >  r;(u)  for 

all  j  <  i.  Therefore  rj{u)  =  r,(u).  In  other  words,  if  there  is  any  ri{u)  greater  than  rfc(u)  it  will  be 

greater  by  1  and  all  values  ry(w)  =  r,'(u)  =  rf;{u)  +  1  where  j  <  i. 

Thus,  under  the  constraints,  all  r,(u)  are  equal  or  there  exists  exactly  one  value  r,(u)  such  that 
for  all  j  <  i,  ri{u)  =  rj{u)  and  for  all  j  >  i,  r{{u)  =  rj{u)  +  1. 

Case  1:  All  ?-.(h)  are  equal.  Since  r(«)  =  £;f_i  r,{u)  and  by  Corollary  3  r(w)  mod  ^•  =  0  then 
satisfying  the  definition. 

Case2:  All  ?•/(  u)  are  not  all  equal.  Therefore  ri  =  R  +  l,i  <  m  and  r,'  =  R.i>  m  for  some  R 
and  m.  Then: 

k  m  k 

r{u)  =  Er,t«)  =  2(/i:+l)+  E  {R) 

1=1  1=1  t=m4-l 

=  m(R+ l)  +  (k- in)R  =  kR  +  m 

Thus,  m  =  r  mod  k  and  R  =  and  ri{u)  meets  the  definition  above. 

(Only  If:)  Summing  the  r,-  a)  values  results  in: 

E '■;(«)  =  k-  -f  r(w)  mod  ^ 

;=i  L  k 

=  /•(«). 

Let  j  =  r{u)  mod  k.  Then  definition,  for  all  i  <  j,  r,(u)  =  rj{u),  and'for  all  i  >  j,  r,(u)  =  rj{u)-l. 
Thus  the  Latch  Ordering  constraints  are  true.  □ 


7.3  Phase  Specific  Constraints 


Using  the  exnrcssions  for  minimum  path  weight  and  pha.sed  retiming  talue.'i. 
'inn.'uHiiE  .■*,■/,.  .1  1)1,01111’  (.oii.siraini.s  wnich  impose  weight  restrictions  01,  .i 

nodes  «  and  v  conditional  on  the  phase  of  node  u. 


'  now  po.ssible  fo 
..  iln.c  oetw.-i'i! 


Theorem  13:  A  well-formed  graph  G  using  a  k-phase  clock  schedule  $  operates  correctly 
under  a  retiming  iff  for  all  u  and  v  in  V: 

Jt 

E'’‘'(w)(^>'(<U  «')-  ^.'-1  («>«)  +  1]  -  ^  -  Lo{u,v). 

1=1 
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Proof:  (If:)  We  expand  the  above  equation  to: 

k  k  k 

Y^ri{u)Li(u,v)-Y^ri(u)Li-i{u,v)  +  J2Ti(u)-r(v)  <  W{u,v)  -  Lo{u,v) 

t=i  1=1  1=1 

k  k~l 

Y^ri{u)Li{u,v)-Y^ri+i{u)Li{u,v)  +  r{u)-r{v)  <  W{u,v)  -  Lo{u,v) 

1=1  1=0 

k 

^(ri(u)-r;+i(u))i,-(«,t;)  +  r(u)-r(n)  <  W{u,v)  -  Loiu,v)  (7) 

1=1 

Case  1:  If  r{u)  mod  k  =  0,  Pr{u)  =  Piu)  and  r,-(ti)  —  r,+i(«)  =  0  for  all  i.  Thus  Eqn.  7  becomes: 

r(ti)  —  r(v)  <  l'K(u,  w)  -  Lo{u,  v) 
cLS  desired. 

Case  2:  If  r{u)  mod  k  =  j,  Pr(a)  =  Piu)  +  j  and 

I  1  for  i=j 

r;(u) ^  -1  for  i=k 

[  0  otherwise 

Thus  Eqn.  7  becomes: 

Lj{u,v)- L(,(u.v)-\- r(u)- liv)  <  W(u,v)  -  Lo{u,v) 

;■(«)- r(u)  <  W{u,v)—  Lj{u,v).0 

(Only  If:)  If  the  constraint  set  is  not  satisfied  then  for  some  constraint: 
k 

J2ri(u)[Li(u.  0)  -  Li-i(iuv)  +  l]-r{v)>  W{u,v)-  Loi'^hv). 

1=1 

Using  the  expansion  to  Eqn.  7: 
k 

D  r;(u)  -  ri+i(a)}Li{u,v)  +  r{u}  -  r{v)  >  W{u,v)  -  Lq[u,v).  (8) 

1=1 

Case  1:  If  r(u)  mod  k  =  0,  Pr(«)  =  P{v)  and  r,(«)  -  rj^.i(u)  =  0  for  all  t.  Thus  Eqn.  8  becomes: 
t{u)  —  r(v)  >  W{u,  v)  -  To(u,  v) 

For  a  critical  path  u  —  w,  itv(p)  =  lF(M,t;)  -  r(t;)  +  r(v)  <  Lo{u.  v  ‘.'hus  the  path  wcichr  \s 
•■ss  til  \n  the  niitiimiini  'it  wotiilit  leauircd  for  correct  roe  -..on. 

Case  2:  If  r{u}  mod  k  =  j,  Pr{ti)  =  P{u}  +  j  and  Eqn,  6  bet  'ines: 

Lj{u,v)- Lo{u,v}  +  r{u)~T{v)  >  W{u,v)  -  Lo{u,v) 
r{u)~r{v)  >  W{u,v)-  Lj{u,v). 

Again,  for  a  critical  path  u-^v,  Wr{p)  =  ir(w,t;)-r(u)+r(u)  <  Lq{u,  v).  And  the  path  weight 
in  the  retimed  graph  is  less  than  that  required  for  correct  operation.  □ 

The  complete  set  of  constraints  that  must  be  met  by  a  retiming  of  a  multi-phase  circuit  graph 
G  using  a  valid  fc-phase  clock  schedule  is: 
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Figure  15:  Level-docked  Correlator  example  arid  resulting  computational  schedule  when  retimed  to 
a  2-phase  clock  schedule  where  T<t,  =  10,  T^j  =  3,  and  =  7. 


I/O: 

Positive  Edge  Weight: 
Phased  Variables  : 

Latch  Ordering; 

Maximum  Path  Delay: 


r{vh)  =  0 

r{u)  -  r{v)  <  w{e)  for  all  edges  u  v 
r(u)  -  Efci  =  0 


(  rj(u)  -  r,(ti)  <  1  1  I  iG{l...k} 

ELi  r,(tt)(I{(w,  v)  -  u)  +  1)  r-  r(v)  <  W(u,  v)  -  lo(u,  v) 


Because  L,(u,v)  is  a  constant  value  throughout  the  retiming  process,  each  of  the  above  equations 
is  a  legal  ILP  constraint  with  a  summing  of  variables  multiplied  by  constants  on  the  left  hand 
side  and  a  constant  bound  on  the  right  hand  side.  Additionally,  because  of  the  highly  restricted 
relationship  between  r;(u)  and  r(w),  the  constraint  set  may  still  be  solved  using  the  Bellman-Ford 
algorithm  as  in  [6,  7|.  Intuitively,  the  Bellman-Ford  technique  holds  one  of  the  two  variables  in 
a  two  variable  constraint  constant  while  modifying  the  other  variable  such  that  the  constraint  is 
T'loi.  ’loldiug  r(u)  constant  also  holds  each  value  of  r,(u)  constant,  allowing  manipulation  of  r(v) 
to  meet  the  constraint  requirement. 

F(u)  =  [r(u)  mod  k]  in  a  well  formed  graph,  so  an  actual  implementation  of  the  Bellman-Ford 
approach  can  make  use  of  a  modified  path  constraint  where  the  correct  value  of  Lp^u)  is  stoied 
in  a  look-up  table  array  indexed  by  [r(u)  mod  k].  This  allows  access  to  the  correct  number  of 
registers  required  for  a  particular  retiming  value  while  ignoring  the  individual  phased  retiming 
variables  required  in  the  general  approach.  A  variation  on  the  Bellman-Ford  algorithm  provided  in 
Appendix  A  makes  use  of  this  technique. 
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8  Summary  and  Future  Work 

VVe  have  described  an  efficient  method  for  optimally  retiming  the  class  of  well-formed,  multi-phase, 
level-clocked  circuits  using  valid  clock  with  arbitrary-length  phases.  This  not  only  is  a  large  class 
of  circuits  widely  used  in  practice;  they  are  also  circuits  that  can  be  easily  produced  by  current 
sequential  synthesis  tools  and  optimal  retiming  of  these  circuits  will  become  increasingly  important. 

Our  next  goal  is  to  remove  some  of  the  restrictions  we  have  placed  on  both  circuit  structure  and 
clock  schedules.  Valid  clock  schedules  can  be  redefined  to  assume  a  delay  greater  than  zero  between 
latches  of  specific  phases.  This  introduces  two-sided  constraints  and  the  manipulation  of  minimum 
delays  as  well  as  maximum  delays.  Work  along  these  lines  but  in  a  different  context  has  already 
been  done  by  Sheiioy  [11]  and  Sakallah  [5].  Extending  the  class  of  circuits  beyond  well-formed 
circuits  places  additional  constraints  on  the  movement  of  latches  in  the  circuit.  These  constraints 
depend  largely  on  the  clock  schedule  itself  and  the  implications  of  removing  the  ordering  constraint 
on  the  correctness  constraints. 

The  idea  of  retiming  has  also  been  used  in  the  area  of  logic  synthesis  as  a  way  of  exposing  and 
applying  more  of  the  functional  relationships  in  a  sequential  circuit.  Malik,  Sentovich  and  Brayton 
describe  the  technique  of  peripheral  retiming  which  allows  registers  in  a  sequential  circuit  to  be 
moved  to  the  periphery  of  the  circuit,  thus  allowing  the  global  resynthesis  of  the  combinational  logic 
as  an  single  unit  [S],  and  Borriello,  Bartlett  and  Raju  have  explored  the  use  of  locdized  retiming 
combined  with  logic  resynthesis  to  reduce  the  overall  clock  period.  Our  techniques  allow  this  work 
to  be  extended  to  level-clocked  circuits. 

Sakallah  et.  al.  describe  a  technique  whereby  the  cycle  time  is  minimized  by  adjusting  the 
clock  schedule  instead  of  the  circuit  [5].  Typically  there  is  not  much  freedom  in  the  design  of  a 
clock  schedule  as  it  must  conform  to  larger  system  constraints.  However,  it  would  be  interesting 
to  consider  simultaneously  adjusting  the  clock  schedule  and  latch  placement  to  minimize  the  cycle 
time. 

In  our  circuit  graphs,  combinational  components  do  not  interact  with  the  clock.  In  CMOS 
circuit  design,  however,  there  are  circuits  such  a5  precharged  logic  gates  whose  inputs  and  outputs 
are  synchronized  to  the  clock.  A  future  topic  of  research  is  to  represent  these  types  of  combina¬ 
tional  logic  circuits  in  our  circuit  graphs  so  that  retiming  can  be  extended  to  more  of  the  circuits 
encountered  in  practice. 

Level-sensitive  circuits  have  long  been  used  for  circuits  where  performance  is  important.  Only 
recently,  however,  have  algorithms  for  analyzing  and  manipulating  these  circuits  become  available. 
The  potential  benefits  of  level-sensitive  circuits  will  make  this  a  very  fertile  area  of  CAD  research 
for  some  time  to  come. 
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A  Algorithm:  Modified  Bellman-Ford  for  Unequal-Phase  Re¬ 
timing 


An  important  impact  of  introducing  phased  retiming  variables  to  allow  retiming  of  unequal  phase 
clock  schedules  is  that  the  resulting  ILP  constraints  no  longer  have  the  form  of  the  difference 
between  two  variables.  The  fact  that  all  constraints  were  of  this  form  was  used  in  edge-clocked 
and  equal-phase  retiming  to  allow  solution  of  the  constraint  sets  by  the  Bellman-Ford  single-source 
shortest  paths  problem  [6,  3].  If  this  efficient  short-cut  to  solution  of  the  constraint  set  is  no  longer 
available,  solving  for  the  optimal  clock  period  of  the  circuit  will  now  require  a  general  ILP  solution 
method  and  a  large  additional  expense  incurred  due  to  the  increcised  number  of  variables  and 
constraints  required  to  guarantee  legal  combinations  of  those  variables. 

Fortunately  it  is  possible  to  format  the  unequal-phase  constraint  sets  in  such  a  manner  that  a 
slightly  modified  version  of  the  Bellman-Ford  approach  can  solve  them.  This  modification  makes 
use  of  the  fact  that  all  phased  retiming  variables  (»’,(«))  can  be  uniquely  determined  from  any  given 
value  of  r(a).  The  process  of  mapping  r[u) :  (r,(u))  imposes  all  of  the  additional  constraints  added 
to  the  constraint  set  for  Phased  Variables  and  Latch  Ordering,  and  the  proper  number  of  latches 
required  on  an  edge  is  stored  as  a  complex  weighting  function  where  w(u—  y)  is  dependent  on  the 
value  of  r(w)  for  each  edge. 

To  present  the  modified  version  of  the  Bellman-Ford  we  proceed  through  the  identical  steps 
used  in  [3]  to  prove  the  correctness  of  the  standard  algorithm.  The  new  problem  being  solved  may 
be  stated  in  the  following  manner: 

Problem:  Given  a  weighted,  directed  graph  G  =  (V,  E),  with  weight  function  w  :(E,  6{s,  u))  — R 
mapping  edges  to  real  valued  weights  dependent  on  the  shortest-path  S{s,  «)  from  a  source  vertex 
s  to  the  head  of  the  edge  u  —  v,  determine  6(s,u)  for  all  w  6  V. 

Example:  For  a  real-world  analogy  of  the  problem,  assume  you  wish  to  travel  by  plane  from 
New  York  to  Chicago  using  the  least  expensive  route.  Fares  are  discounted  at  each  departure  point 
based  on  the  expense  of  travel  to  that  location.  In  keeping  with  airline  tradition,  for  a  given  arrival 
cost  the  amount  of  discount  is  completely  arbitrary  and  is  contained  in  a  look-up  table  accessed  by 
arrival  cost. 

Figure  16  provides  an  illustration.  The  the  source  node,  New  York,  is  shown  with  two  paths 
existing  to  the  destination  node  Chicago.  The  value  ^(s,  s)  =  0  so  the  edge  weights  from  New  York 
to  Chicago  and  Houston  are  6  and  10  respectively.  This  provides  the  value  of  S{s,  Houston)  =  10, 
therefore  the  edge  weight  from  Houston  to  Chicago  is  -5.  The  shortest  path  from  New  York  to 
Chicago  is  then  the  path  through  Houston  and  has  a  weight  of  5. 

In  our  modification  of  the  shortest  paths  problem  we  are  given  a  weighted,  directed  graph 
G  =  iV,E),  with  weight  function  w  :  {E,S(s.u))  — R  mapping  edges  to  real- valued  weights  using  a 
function  based  on  the  weight  of  the  shortest  path  leading  to  the  node  at  the  beginning  of  the  edge. 
The  weight  of  a  path  p  =  (fo,  I’l, . . . .  I'k)  is  the  sum  of  the  weights  of  the  edges  along  the  pain  in 
the  identical  manner  to  the  original: 

i=l 

The  shortest  path  weight  from  u  to  v  is: 


min{u;(p)  :u-^v}  if  there  is  a  path  from  u  to  v, 
00  otherwise. 


30 


w[— l]  =  S 
«)(0]  ^  6 


Figure  16:  ,4  simple  (jraph  iilu^lratincj  the  shortest  path  problem  with  edge  weights  dependent  on 
weight  of  shortest  path  leading  to  a  the  node  beginning  the  edge.  In  this  graph  the  source  node  is 
New  York  and  edge  weights  of  interest  are  shown  in  the  form  u)]  for  each  edge  u  —  v. 

Beginning  with  Lemma  25.1  in  [3j  we  substitute  our  new  definition  for  edge  weight  and  show 
that  each  of  the  proofs  leading  to  use  of  the  Bellman-Ford  algorithm  still  apply. 

Lemma  25.1 (Subpaths  of  shortest  paths  are  shortest  paths) 

Given  a  weighted,  directed  graph  G  ~  (V^E)  with  weight  function  w  :  (E.6{s,u))  — R,  let 
p  =  {vq,  Vi,  —  Vk)  be  a  shortest  path  from  vertex  wi  to  vertex  vk  and.  for  any  i  and  j  such  that 
1  ^  ^  ^  J  ^  ;;,j  =  {i;,,Uj+i, . . .,  Uj)  be  the  subpath  of  p  from  vertex  y;  to  vertex  vj.  Then  pij 

is  a  shortest  path  from  V;  to  vj. 

Proof:  The  definition  of  the  weight  of  a  path  has  not  changed  from  the  original,  hence  this 
proof  follows  identically  from  the  original:  decomposing  the  path  p  into  vi  —V{  ^vj  ^  Vk,  then 
iy(p)  =  w(pii)  +  MPtj)  +  ^{P}k)-  If  there  is  a  path  p,)  from  Vi  to  Vj  with  weight  w(p{j)  <  w(pij). 

Then  uj  ^V(  ^Vj  ~  Vk  is  a  path  from  u,  to  Vk  whose  weight  u;(pi,)  +  w(Pii)  +  ^(Pjk)  is  less  than 
w(p),  which  contradicts  the  premise  that  p  is  the  shortest  path  from  to  Vk-  □ 

CorolH’"-  . 

Lts.  o  —  ■  -  be  a  weighted,  direcied  grapn  u."  •  ucight  fimctinit  w  :  (E.dis.u))  —11.  5  se 

that  a  shortest  path  p  from  a  source  s  to  a  vertex  v  can  be  decomposed  into  s  —  u-**v  for  some 
vertex  u  and  path  p.  Then  the  weight  of  a  shortest  path  from  s  to  v  is: 

6{s,  v)  ~  6{s,  u)  -f  wiu,  V,  S(s,  u)). 

Proof:  By  Lemma  25.1',  subpath  p  is  a  shortest  path  from  source  s  to  vertex  u.  Thus: 

5(5,  v)  =  w{p) 
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=  w{p)  +  w(u,v,S(s,u)) 

=  S(s,u)  +  w(u,v,S(s,u)).0 

It  is  now  necessary  to  redefine  the  relaxation  technique  to  make  use  of  our  new  definition  of 
edge  weighting.  Identically  to  the  original  work,  for  each  vertex  v  e  V  we  maintain  an  attribute 
d(v],  which  is  an  upper  bound  on  the  weight  of  a  shortest  path  from  source  s  to  v.  d[t;]  is  called 
a  shortest-path  estimate.  The  shortest-path  estimates  are  initialized  using  the  following  procedure 
identical  to  the  original: 

INITIALIZE-SINGLE-S0URCE(G,3) 

1.  for  each  vertex  v  €  ^[0]  do  { 

2.  (i[u) «—  oo; 

3.  !r(v)  ~NIL;  ) 

4.  (/(s) «—  0; 

The  relaxation  algorithm  tests  whether  we  can  improve  the  shortest  path  to  v  found  so  far  by 
going  through  u  and  if  so.  updating  d[u]  and  7r(?;].  The  code  for  performing  the  relaxation  step 
is  only  slightly  modified  from  the  original  in  order  to  account  for  the  more  complex  edge-weight 
function. 

RELAX(«, 

1.  if  (i(v]->  (i(«]  +  w{v.,v,  d(«))  then  { 

2.  •-  d(tt)  -H  u;(u,  V,  (i(u]); 

3.  fffu]  u\  } 

As  shown  in  the  following  Lemmcis  and  Corollaries,  this  new  definition  of  relaxation  supports 
the  same  properties  required  of  the  original  relaxation  function.  Because  these  key  proofs  are 
supported,  algorithms  for  finding  shortest  paths  based  on  the  relaxation  method  work  for  the  new 
weighting  function  as  well  as  for  the  old. 

Lemma  25.4': 

Let  G  =  (V,E)  be  a  weighted,  directed  graph  with  weight  function  w  :  (E,6(s,u))  -^R,  and  let 
u  -*  V  ^  E.  Then,  immediately  after  relaxing  edge  u  —  v  by  executing  RELAX (u,v,w),  we  have 
(/(«]  <  d[u]  -1-  w{u,  y,f/[tij). 

Proof:  If  just  prior  to  relaxing  u  ->■  v  we  have  d[v]  >  d(ti]  +  tn(w,  v,d(«]),  then  d[v]  =  d[u]  -f 
afterward.  If,  instead  d(v]  <  d[u] -1- ii;(u,  t;,  d[«])  just  before  the  relaxation,  then  neither 
d(«]  nor  f/[u]  changes,  and  so  (/[(/■]  <  ^[h]  -f  w(u,  v,d[u])  afterward.  □ 

Lemma  25.5 

Let  G  =  [  V,  E)  be  a  weighted,  directed  graph  with  weight  function  w:{E,  S(s,  u))  -»-R.  Let  s  e  V 
be  the  source  vertex,  and  let  the  graph  be  initialized  by  INITIALIZE-StNGLE-SOURCE{G,s).  Then 
d[t;]  >  6{s,  v)  for  all  v  ^  V,  and  this  invariant  is  maintained  over  any  sequence  of  relaxation  steps 
on  the  edges  of  G.  Moreover,  once  d[v]  achieves  its  lower  bound  6 {s,  v),  it  never  changes. 

Proof:  The  invariant  d[t;]  >  6{s,v)  is  true  after  initialization  since  d[s]  =  0  >  S{s,s)  and 
4v]  =  00  Implies  (/[u]  >  6{s,  v)  for  all  t;  €  V  -  {s}.  Using  contradiction,  let  v  be  the  first  vertex  for 
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which  a  relaxation  set  of  an-edge  u  v  causes  ci[v]  <  6{s,v).  Then,  just  after  relaxing  u  -*  v  we 
have: 

c/[u]  +  «;(u,  v,f/(u])  -  d(v) 

<  6{s,u)  + 'w{u,v,6(s,u)) 

whichdmplies  that  d(«]  <  But  because  relaxing  u v  does  not  change  d[u],  this  inequality 

must  have  been  true  just  before  the  edge  was  relaxed,  which  contradicts  the  choice  of  u  as  the  first 
vertex  for  which  d[t;)  <  6(s,v).  □ 


Corollary  25.6 

Suppose  that  in  a  weighted,  directed  graph  G  =  {V,E)  with  weight  function  w  :  (£),5(s,u))  -'■R, 
no  path  connects  a  source  vertex  s  €V  to  a  given  vertex  v  eV.  Then  after  the  graph  is  initialized 
by  INITIALIZE-SINGLE-SOURCE{G,s),  we  have  d[t;]  =  5(5,  v),  and  this  equality  is  maintained 
us  an  invariant  over  any  sequence  of  relaxation  steps  on  the  edges  of  G. 

Proof:  By  Lemma  25.5',  we  always  have  oo  =  5(5,  u)  <  d(?;];  thus  d[v]  =  oo  =  5(5,  u).  □ 


Lemma  25.7': 

Let  G  =  {V,E)  be  a  weighted,  directed  graph  with  weight  function  w  :  (£?,5(5,  u))  — R,  lets 
be  a  source  vertex,  and  let  s — u  —  v  be  a  shortest  path  in  G  for  some  vertices  ii,u  €  V.  Suppose 
that  G  is  initialized  by  INITIALIZE-SINGLE-SOURCE{G,s)  and  then  a  sequence  of  relaxation 
steps  than  includes  the  call  RELAX(u,v,w)  is  executed  on  the  edges  of  G.  If  d[u]  =  5(5,  u)  at  any 
time  prior  to  the  call  then  d[v]  =  5(5,  v)  at  all  times  after  the  call. 


Proof:  By  Lemma  25.5',  if  dju]  =  5(s,ti)  at  some  point  prior  to  relaxing  edge  u  —  u,  then  this 
equality  holds  thereafter.  In  particular,  after  relaxing  u  —  v  we  have: 

d[v]  <  d[ii)  +  io(i£,v.(/[u])  (by  Lemma  25.4') 

=  5(s,  u]  +  iv(t<.v,5(5,u)) 

=  5(5,  u)  (by  Corollary  25.2'). 


By  Lemma  25.5 ',  5(s,  v)  bounds  d(t;]  from  below,  thus  d[v]  =  5(s,v),and  this  equality  is  maintained 
thereafter. 

Now  that  the  above  properties  of  relaxation  have  been  proven  for  the  niodified  relaxation  tech¬ 
nique.  proofs  for  shortest  paths  algorithms  dependent  on  the  original  relaxation  technique  may 
be  s;‘uwn  to  work  or  algorithms  using  the  modified  technique  as  well.  This  includes  both  Dijk- 
.'U:.  .i  -u  I’.tc  jieiii.i.iu-i’oik  .^u.-tinn.  Because  uie  Bellman- toru  tiiuoriviuii  can  hannlf  ncgar.ve 
weight  edges,  it  is  possible  u  restate  the  linear  programming  problem  using  difference  constraints 
as  a  graph  upon  which  a  single-source,  shortest  paths  algorithm  is  run  to  determine  if  a  feasible 
solution  to  the  constraint  set  exists  or  not. 

Using  the  modified  relaxation  technique  it  is  possible  to  restate  the  more  complex  constraints 
foriried  for  unequal-phase  retiming  as  a  constraint  graph  where  the  weight  of  each  edge  is  dependent 
on  the  weight  of  a  shortest  path  to  the  beginning  of  the  edge.  In  essence  the  set  of  variables 
(ri(ti), . . . ,  rk{u))  are  determined  directly  from  the  modulo  function  on  r(u)  and  the  corresponding 
constraint  weighting  selected.  Thus  w(u,v)  is  a  function  of  r(u). 
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Path  and  edge  constraints  for  unequal  phase  retiming  may  be  written  a  set  5  of  m  sub-sets 
of  linear  inequalities  where  each  sub-set  is  of  the  form: 

r(it)  -  r(v)  <  W{u,  v)  -  L,(«,  v)  for  t  6  {0 . .  .k} 

The  constraints  may  be  represented  as  a  constraint  graph  G  =  {V,E,w[0...k  -  1]).  For  each 
variable  r{u)  there  is  a  vertex  in  V.  For  each  set  of  constraints  in  S  there  is  an  edge  u  v  in  E 
with  weights  tn(e)(i]  =  v)  -  L{(u,v),  i  6  {0 . .  .A:  -  1}.  For  each  constraint  in  S2  there  is  an 

edge  u  u  in  E2  with  weight  W2{e)  =  to(u,n). 

BELLMAN-FORD(G.u..3) 

1.  INITIALIZE-SINGLE-SOURCE(G,  s); 

2.  for  i  —  1  to  |K(G)  -  1|  do  { 

3.  for  «  —  u  S  E[G)  do  { 

4.  RELAX(u,  y,  lu):  )) 

5.  for  u  —  w  €  E[G}  do  { 

6.  ifd(v)  >  d(«)+  i«(“i  Vi<^(«))  return  FALSE:  } 

7.  return  TRUE; 
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